School Database Revamp - Final Report

This is the final report for Project 112 - School Database Revamp.

Contents

1. Introduction

The primary aim of this project was to re-factor the old pre-EUCLID School Database into a more modern and maintainable solution which would meet the School's ongoing needs for information management. The secondary aim of this project was to add a "managed" database for PGR records such as funding concessions and milestones to replace the "single admin user developed and managed" system (originally Project 020 - IGS Database).

2. Deliverables

  1. A new central School Database. This was achieved. The old Ingres database system, proprietary functional processing, locally developed GUI and report SQL were migrated to work with a PostgreSQL database system configured entirely by LCFG. No specific functionality was added at this stage, although there were fringe benefits from the move to PostgreSQL such as performance, robustness and the fixing of a number of long standing user interaction bugs attributed to issues with Ingres. Beyond this there was no user impact at this point, users continued to use the same old interfaces for access, updates and reporting.
  2. A new user interface. This was achieved. The old Applixware/SHELF GUI was replaced with TheonUI, a client/server web application which while using the same essential design was re-factored into a Javascript client and Python server. By using the same design all the old custom forms used by the ITO worked (equivalently; there were presentational differences) without modification under TheonUI.
  3. A new report generation framework. This was largely achieved. The old locally managed "conduits" system was re-factored into the TheonPortal. The "gurgle" software remains in use, predominatly now as an SQL dataset templating language rather than as a "processing" language. We would have liked to have done more in the reporting area and some of the more aspirational requested user enhancements were not met. The old "genrep" shell command for on demand report generation was replaced by the Refresh link and Builder feature (where a report has runtime options) on the TheonPortal. Note that "genrep" is still used for some legacy applications which have not been migrated yet, but all other menu options have been culled.
  4. A new PGR records database. This was achieved. The locally developed data model was integrated, new custom forms produced and a Portal (online report browsing) presence added.
  5. Fixes and/or resolution to RT:38375, RT:39329, RT:42270 and RT:41588. RT:38375 and RT:42270 were met. RT:39329 is pending InfHR process migration. RT:41588 is out of scope and would now constitute a separate project.
  6. A full statement of what is held for the purposes of compliance with Data Protection legislation. There is a provisional DPA Compliance Statement but we expect that to evolve beyond the natural closure of this project.

3. Additional Deliverables

A certain amount of work "related" to the core deliverables above was carried out. These were not part of the original project but were included in it as they "informed" the main project.

  1. A new data model. This was achieved. This was a complete refresh of the existing model to incorporate a number of critical changes that were impeding admin staff, not least a move from the old Class/Module teaching and assessment structures to the new EUCLID Course structure. The new data model also helped to determine what technical functionality would be required both of the primary user interface as well as the reporting system.
  2. Move to fully version controlled structure. This was achieved. All aspects of "Theon" are now under version control.
  3. Introduce a bug tracking and development web site. This was achieved using Trac, which is now a generic service configured by LCFG.
  4. Improvements to lcfg-pgluser which is now a generic PostgreSQL account management tool. This was achieved, as well as improvements to the generic lcfg-postgresql component.
  5. A new Duty records database. This was achieved. The locally developed (by admin staff) data model was integrated, new custom forms produced in the user interface and a Portal presence added.

4. Codebase

The TheonPortal framework was largely a tidying up of the pre-existing "conduits" structure. It consists of approximately 700 lines of shell script, a small Python CGI, about 1000 lines of CSS (largely third party) and supporting resources such as images and configuration files. Each "article" (report) on the Portal is a separate file of which there are currently nearly 300 (many of these developed subsequently to this project however) totalling about 20,000 lines of "gurgle" code, although predominantly this is SQL and formatted output blocks. The articles share common header files totalling about 6,000 lines (specific to the Portal). The remainder of the "conduits" tree (supporting the Portal and legacy applications yet to be migrated) is about 1000 lines of "gurgle" code. Some of this supports the generation of "schema" (until the change management project is completed). There are about 600 separate files covering this processing, totalling 16,000 lines of "gurgle" code. These were all either developed or adapted/migrated for the new data model. In addition there are 100 header files supporting the "schema", about 13,000 lines of "gurgle" code. In total the TheonPortal framework supporting the Portal itself and all its constituent articles and the "conduits" framework supporting the TheonPortal framework as well as the interim "schema" generation process is about 60,000 lines of code. Over and above this all the pre-existing legacy conduits were migrated to PostgreSQL (since, at that stage of the project, they would still have been needed; in fact many still are pending migration of the final applications). Some were also partially adapted to work with the legacy and Theon data models due to a split of data source. There are about 550 of these totalling about 61,000 lines of "gurgle"/SQL code.

The interim "schema" framework (pending change management) over and above what is described is held in a central database that in turn produces an XML Schema held in multiple files. This comprises nearly 2700 files and over 460,000 lines of XML Schema code. This is processed by XSLT templates into the master DDL and its documentation as well as meta data files for the TheonUI. This XSLT codebase is about 35 files and 13,000 lines. This aspect also includes the schema processing tool "schematic" a Python tool of about 1,000 lines. The central database holds the data model definition as well as the construction of all the desktops and panels used on the TheonUI. The data model definition consists of about 63,000 lines of functional SQL, including roles and permissions.

The TheonUI consists of a Javascript client using the YUI2 framework and a Python server using WSGI. The client is about 8000 lines of commented code. It also includes about 1000 lines of CSS. The Python server is about 5000 commented lines.

There are also a number of supporting scripts and tools developed which are too numerous to mention here. The Theon project also "absorbed" the "gurgle" programmme in order to make small enhancements as necessary to support the TheonPortal framework. This is a C programme of about 7,000 lines. Also the "eventd" daemon (for automated handling of notify event processing) was ported from Ingres to PostgreSQL. This originally handled the mark processing queue, but in PostgreSQL the performance benefits were sufficient to bypass queue handling and process marks directly. Instead it continues to be used for the new "temporal processing" introduced largely to handle date triggered currency and status changes, however this mechanism is also currently in deprecation and with the introduction of background daemon processes in v9.3 of PostgreSQL we expect "eventd" to ultimately be dropped. It is about 300 lines of C code. The Python "pgluser" script as used by "lcfg-pgluser" for PostgreSQL account management is currently integrated into the repository and is about 1000 lines of Python.

The current Theon repository (which does not reflect just work on this project of course but much work subsequent to its completion) is about 4,000 files or 2 million lines of code and data (the bulk of this is automatically generated however).

5. Timeline

The project was started in Q2 of 2009. The first step was to develop a comprehensive review for the way ahead. The review is available.

Following on from this in Q3 and Q4 2009 the backend database shift from Ingres to PostgreSQL was done. The actual switchover for users was at the end of January 2010. Part of this migration was an aborted attempt at benchmarking over concerns PostgreSQL would not perform as well as Ingres. It was proving difficult to port the TPC benchmarks, would have been costly to roll our own, and the work to migrate to PostgreSQL was in any event underway so, though it was abandoned relatively early on it cost some amount of effort. This shift involved porting both the old client application (although original work on this had been done years previously that had at least proved the principle), all the SQL conduits for reports, all the other scripts that interacted (such as the inventory and account management tools) and the core functional processing, over and above the preparation and conversion of the raw data itself. The most critical and time consuming amongst the functional processing porting effort was the code responsible for mark processing for the Board of Examiners; this had to be very carefully altered (due to functional incompatibilities between Ingres and PostgreSQL). It should be noted that despite the great complexity in coordinating such as system migration there was not a single hitch post switchover.

Part of the move to PostgreSQL also saw the development of the XML Schema model with related XSLT to underpin all future schema development. A decision was taken on this approach and done very early on (Q3 2009) as it would underpin provision of data for the model DDL and critically the data structures for UI client and server. An interim management framework was put in place to maintain this, adapted from what was there before.

ISS were commissioned to produce a "requirements" document outlining a roadmap for what they would like to see in the new service. There was no question of being able to necessarily meet everything in this (they were given a remit to think "out of the box" as much as possible). However, it would give a scope so that the design of system aspects would not preclude future implementation. As it turns out a very large proportion of these requirements were met in one form or another. The requirements document is available.

During Q1 and Q2 2010 the entire data model was revised. This was to make a number of major key changes as well as to bring in improvements identified by ISS. Key enhancements added here was integrating support for the current standalone Access databases for PGR and Duty records. Also important here was moving the old class/module assessment structures to a course based model. The revised model document was made available mid-May 2010 and the snapshot of the data model then is available.

March 2010 produced the client/server API spec. This was a breakout of the old native client application into two parts with a defined inter-communication protocol. This was critical as it allowed work to proceed on client and server relatively independently in parallel from that point on until initial testing. This picture below shows the re-factor split with the original native client on the left and the new web based client on the right, the colours are the same on both sides to show how functional modules were translated from a single client to a client/server model.

TheonRefactor11.png

The actual API specification is available.

The overall architecture of the new service had been defined by this point as well as the detail of the UI architecture.

In Q2 2010 a prototype Javascript client was developed to test frameworks and the viability (interactivity and performance) of the proposed web browser solution. This was then completely scrapped and re-written at the same time as the Python server was under development in Q3 and Q4 2010. In mid-July 2010 the first rough cast of the TheonUI was made available (as a testing demo only) for ISS in preparation for IGS. There were no custom desktops at this point. In early October 2010 the IGS got their first look at desktops for their review and correction. This was followed in November 2010 by a "live" deployment of the Duty desktops (custom forms) and reports. The Duty process was used as a test model for user acceptance as it was only managed by one user and was not replacing anything currently in the legacy systems. This provided good opportunities for user engagement and feedback. At the same time in Q4 2010 the IGS replacement database system was still being developed. This was also used as a test model for user acceptance as it would be used by the IGS admin solely and was not a replacement to an in-use system. In Q1 2011 this went live for IGS admin use. Live usage of this was highly beneficial for further evaluation and adjustment prior to rolling out the system for ITO admin and BoE use (the most substantive and complex user base, as it was also replacing the core legacy system). This switchover for ITO happened in Q3 2011 just prior to the start of the academic session. In late summer 2011 the old client interface was turned off and all users were moved onto TheonUI. In the case of InfHR and CO users this made use of the ability to continue using the legacy desktops imported from the old client. At the time of writing, InfHR still use these legacy desktops (not least because this data is yet to be fully migrated to the 3G model) and ITO still use a few legacy desktops for a a few specific workflow processes which have not yet been migrated to the 3G data model.

There were very regular (generally weekly) user meetings and testing throughout the latter part of 2010 to manage the deployment process, address concerns and make progress reports. Final QA was early Jan 2011 just prior to the IGS go-live (although the service had effectively been live for Duties and one user since mid Q4 2010). There was a heavy subsequent support load post rollout despite this. This has been included in the project effort below, largely because issues arose around the time in session when they were actively being used (such as the examiners processes) rather than immediately on deployment.

6. Effort and Analysis

This project was given 25 weeks of effort. This was probably about enough to cover the core basic work but seriously underestimated all the fringe effort (which was not immediately apparent at the outset). This probably contributed to a doubling of the core effort, to 50 weeks. Total effort was 80 weeks - the additional weeks are almost certainly relating largely to work which was an offshoot of this project rather than a core deliverable of the project itself. There was an under-estimation on the "excitement" factor of having a new system and the possibility to do even more quite cheaply. It is not possible to differentiate this as the effort reporting at the time does not usually make a distinction (without doing a significant amount of post-analysis). We attempt below to differentiate the main development areas a bit by comparison to the deliverable timeline, bearing in mind that there were multiple concurrent threads, to give a more detailed effort breakdown.

There was also substantial related effort needed on Webmark, LCFG PostgreSQL component, Coursework Submission System, EUCLID, EUCLID Interoperability, EUCLID Informatics Enhancements and Prometheus (new feeds and migration) - some of these in explicit projects some not. None of these are included in this analysis.

The project deliverables were all effectively met by the end of Q2 2011 and the end of core project effort largely tails off then as well (as shown in graph below). The graph below shows effort in hours over time (in year/quarter blocks) for the core project effort.

effort112.png

The following graph is of effort broken down (approximately) by key development area where this has been possible to do cheaply from the records.

effortbyarea.png

The initial review effort was in 2009Q1 and 2009Q2. Migration effort dominates for 2009Q3, 2009Q4 and 2010Q1. However these quarters also show where effort was spent on key "progression" areas which were actually achieved very quickly (such as the XML Schema and API spec) and were a tiny fraction of time relative to the entire project, and yet underpin most of the subsequent implementation. These were key decisions, such as adopting a "webapp" approach, how to split the existing application into a client/server design for parallel implementation, how to leverage our existing metadata structure to provide a better formal foundation but also a means to automate the "data-driven" design approach, adopting and formalizing the use of XML Schema/XSLT as a new underlying schema technology while allowing it to represent the existing client "data" so existing custom forms would work as-is under the new web application in a browser rather than on a native client application. Some of the ease of this can be put down to decisions made nearly twenty years ago for such large scale system design. 2010Q2 shows an equal split of work on the entire data model revision (not core project but considered necessary in some areas and to inform development in others) and on the UI prototype. 2010Q3, 2010Q4 and 2011Q1 show effort on the UI itself as well as the development of the PGR and Duty workflows. Effort on the Portal framework pops up in 2011Q1 (again not core project as this did not pre-exist specfically, although it could be considered a "migration"/"refactor" of some of the existing generated web pages). The remaining quarters show effort tailing off on the UI and workflows as various enhancements and fixes are made to the live service.

The introduction of the live system for Duties in Q4 2010, PGR records in Q1 2011 and UG/PGT records in Q3 2011 placed a very heavy operational committment on us and inevitably slowed down further development. This can be seen in the analysis of email correspondence over the period below.

email.png

The graph clearly shows for example that related messages reached almost eight times their nominal background level at their peak. While some take only minutes to resolve most messages in this period would have required technical fixes or enhancements to be made which would have taken hours or days in each case. It is easy to see how requests would have started to stack up and that raw development effort in 2012 was severely diminished.

However such a deployment approach ("release early release often") as used in many open source projects was also very beneficial in engaging with the user base (getting "buy in") which is crucial for such large impact systems. Sadly the installed system was not in a sufficient place to be supported by anyone else, certainly not for the level of technical support required. To do so is likely to have delayed deployment for at least a further year which was considered by all involved to be untenable. Operationally it was preferable to fix end user requests rather then work on aspects of the service of no immediate gain to end users, certainly this seems a sensible stance during initial deployment and settlement.

The graph below shows combined effort (core and non-core) alongside revision commit change over time. This commit change is related to the migration to PostgreSQL, integration of the new schema and related technology and the development of the UI and Portal.

change.png

The humps of effort relating to migration in 2009 Q3/Q4, data model revision and UI development in 2010 Q2/Q3 and 2010 Q4 rollout (Duty), 2011 Q1 rollout (PGR) are clear. The 2011 Q3 rollout is substantially less costly, despite the ITO student workflow being the largest and most complex, as most of any core issues had been addressed by that time. The commit change (excepting migration) tends to trail the actual effort as a lot of development was done "out of repository" during the initial stages.

The following graph breaks down the commit change over time by area as a percentage of the total change.

commits.png

The dominant areas of change are "legacy" representing primarily conduits under the portal and "schema" representing data model changes (although this is a derived extraction and in this context largely represents the desktops). There are small blips for the portal development itself and also for associated tools. The other large areas are the "UI" and the "xslt" which builds both the schema DDL and the UI metadata (so, a lot early on in development).

Total project effort was 80 weeks, however not all of this was core. A breakdown of total effort by area (approximate) is given in the graph below.

effortsummary.png

Again critical early work on XML Schema and XSLT (xml/xslt and schema) and the UI API (client/server api) can be seen to take very small amounts of time (3 weeks). The total effort on migration was 18 weeks. The total work on the UI was 23 weeks. The total work on the PGR Workflow was 6 weeks. Hence the total for "core" project work amounted to 50 weeks, exactly twice the original estimation (although that was probably not based on much more than picking a number out of the air). Non-core work made up the rest. Management effort alone (meetings, planning and reviewing/scoping requirements) accounted for 6 weeks and was never considered in any estimation. While the data model revision was a significant additional committment - it can effectively be considered core as most of its changes would have been required by the PGR or duty workflow, or the teaching assessment, anyway. Some of the changes planned as part of this model are only just appearing in the live database at the time of reporting, yet the bulk of the planning and technical work for this had been done back then.

Effort on conduits (reports, web pages and system data extracts) is and has been a constant background committment since before the project even began. This covers small enhancements and fixes that arise operationally. Most of this should not be considered core project, except where it was initial development related to specific workflows. The graph below shows this steady "technical" operational committment over time and how it increased during the main project development timeframe.

conduits.png

7. Approach or The Good, the Bad and the Ugly

This was a very large project both in scale and complexity. Probably it has been the largest project undertaken by the Informatics computing team in recent times, the next directly comparable project we can think of is the original DICE project ten years ago or so or maybe some of the more complex platform upgrades.

The DICE project was worked on by the entire computing team whereas this project was worked on by effectively just two people. While it is possible that the project may have taken less overall time by being worked on by more people this is not absolutely clear - we do believe that either way it would still have been beneficial to the computing team as a whole for there to have been wider involvement.

There was a clear split in this project between the migration phase and the user facing phase. There were internal splits in both, within the latter such as data model revision, ui development, workflow production however these were never formalised in a particularly visible structured way (such as project milestones). While doing so may have made progress externally more transparent it does not seem to necessarily be the case that this would have speeded up development or resulted in any improvement in eventual deliverables, it may even have compromised the "flexible" development.

This project worked very much along the lines of "release early and often", and involved the user base (or key representatives thereof) at a very early stage. While this placed a high additional effort (both as added management effort as well as a shifting requirements base) we are sure that this contributes to a much better user acceptance of "change". However, the impact of live release and the consequential operational load as users established the content and business model and then required changes to match was hugely underestimated. This was compounded by high priority events (such as PGR finance, NPRAS reporting, Teaching Support changes etc) that Theon could now easily do and was seen as the perfect vehicle to achieve a fast and flexible response.

One big issue was the "phantom mark" crisis that arose during the first bulk examination period that Theon was in use. It resulted in a complete re-check of all entered marks by the ITO. This was in itself not necessarily a bad thing, but did lead to a more cumbersome mark entry workflow. By being highly intermittent and unrepeatable the cause of this issue was unfortunately almost impossible to determine. A probable cause of this issue was however eventually established within the last six months and has since been corrected. Beyond this the robustness and security of the developed system has proven itself repeatedly over time.

The system has been proven flexible (as shown by fast adaptations to existing business processes and addition of support for new processes) but the hold off on update to the underlying schema management system (largely due to bedding in and end user change pressures) has limited this flexibility largely to the core developers. This has become a development bottleneck. The release of the replacement schema change management system will hopefully widen the flexibility to others. The integration of the system within our own infrastructure (both systems and administrative business processes) has limited wider deployment to date - change (schema), release (packaging) and configuration (LCFG) management improvements in the back end will hopefully help to address this iff sufficiently resourced.

8. Documentation

Technical documentation for Theon is available on the Theon Trac Wiki. Some of this assumes a certain amount of pre-requisite knowledge.

9. Ongoing and Future Work

This is all in the TheonTrac where there are separate roadmap milestones for different areas.

-- TimColles - 07 Nov 2013

-- GrahamDutton (edits) - 08 Nov 2013

Topic revision: r6 - 23 Apr 2014 - 09:11:19 - TimColles
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies