Latex Aware Repository for Collaborative Working with Web Frontend

Project design and staged implementation for a version controlled repository that is LaTeX aware with simplified command line access and Web front-end for collaborative working. We will call it Coltex for short (COLlaborative laTEX).


Create a Subversion repository to store constituent parts of a LaTeX document being worked on collaboratively. The repository should have hook scripts such that whenever changes are made the final document is automatically rebuilt. The repository should have a web interface to allow the upload (via the web) of new and modified LaTeX files (by authorized users) into the repository. The web interface should also make available the built document and any errors that occured in building the document.


The system should satisfy the following essential requirements.

  • Changes to the document should be easily visible (like change bar annotation in Microsoft Word)
  • Multiple people should be able to edit the document without conflict and with minimal merge problems
  • It should take advantage of a latex-aware diff (maybe latex-diff) so that changes can be merged in more easily; the problem with text is that line wrapping can - with one minor change to a document - change many lines
  • Must be able to handle arbitrary file types, particularly to cope with figures (e.g. xfig, omnigraffle, illustrator, PDF files...). I don't think we need anything smart to convert formats on the fly though
  • Must be robust to differences in Unix and Windows linefeed / carriage return characters
  • Must allow external users, without a DICE account, to use the system
  • Must work with "all" browsers
  • A command line interface aswell as the web-based one - I guess this exists by default anyway (with CVS/Subversion) - particularly so that the system is slick enough for internal-only use (with external collaborators, we are willing to pay a small overhead of inconvenience, but if this can be minimised, then I envisage using it for everything I write in latex
  • Creation of new user accounts by other users, or some moderated self-registration procedure
  • Documentation built in to the web interface
  • Obviously should be pdflatex based
  • It must be possible to have restricted access (to user-defined groups of people) for documents, rather than a single repository where all users can see everything. This is essential when writing grant proposals, particularly with external collaborators, for example.

If the system could also satisfy the following desirable requirements that would be rather nice.

  • For the command line interface: wrapper scripts to avoid directly calling subversion commands, for less expert users
  • Web access to previous revisions (something like what you get on the Wiki, where you just click on a previous version to see it in the browser) would be very cool, even better if you get highlighted diffs between two versions
  • Possibly being able to view source files in the browser
  • Even a simple Wiki-like text-editor embedded in the browser, for really naive external users to edit documents with ... this may be too hard to set up though

Design and Implementation

I believe the following design specification should realise all of the essential and desirable requirements and produce a neat system (both from the command line and via the web). The design is described below in stages so that a basic system is got up and running quickly and can be evaluated before further command line and web functionality is added. The design is not particularly complex and with care would not require too long for implementation.

Stage One - Subversion Repository

Implement a Subversion Repository Template. This is used to clone a new repository for each "document". Provide a trial server to host the template and cloned repositories (and ultimately web access). This server will be known as "" for Subversion and Web access.

The Template repository has a few additions to a standard one, including default access controls (none, no access by default, except to repository creator). There is also a corresponding Web area associated with each repository. We envisage having paths of the form "" where NAME is the name of the document the repository contains.

The repository will have a post-commit script that attempts to rebuild the output document. This happens in the background - the client always returns immediately after the commit. If the LaTeX build causes an error a logfile is produced into into the repositories web area along with an error status file. If the LaTeX build is successful the generated PDF is committed into the repository and the error status file updated to success. We also build a "change document" this is a PDF created to show change bars between the current version and the previous version (will keep an archive of differences) - this uses some kind of script taking latex-diff on HEAD and previous to create extra LaTeX markup into the document to then build the change document. Or some kind of latex-diff output may be sufficient than having to create a new PDF each time?

We do the build in the background, firstly for speed (users don't wait on build) and secondly to make the web front-end work more simply. Also so that using "svn" directly, where it might be a number of iterations until a successful build perhaps (with different people editing at same time) don't want a long wait time inbetween. I think this way will work better smile It can be clever and fold mulitple build requests (eg. one commit, starts build, multiple commits during build do not trigger build but set flag for one rebuild once current build finishes).

Also creates status files into apache web area, so can see if build in progress or error so on page loads can be indicated to user for repository web front-end.

The document if successfully built is committed into the repository itself (the post-commit script is clever enough for this to not cause an infinite recursive build). Before a build the existing PDF document is removed from the repository (web can access older revisions if necessary while a build is being completed). The post-commit script handles multiple changes to the status/build by locking in some manner. A post-commit on the PDF triggers a copy of it into the apache web area for the document for web access (separate from svn web access).

The post-commit script is also used to produce an RSS feed for the document web area (later).

Also in the template is an initial file in the repository called "pdflatex.conf" which is used by the post-commit script to control the document build process (which file to LaTeX plus other options) - it can of course be modified to suit the particular document although there will be standard defaults.

Check all the above works manually creating a repository to match the template and editing a document. Done through Subversion file:/// only at this stage with only one user.

Stage Two - Subversion with Apache and Access

Configure http access to the repository(s) by setting up Apache, mod_dav, mod_dav_svn and the other mod_dav_svn one (?). This will also be used for access control management. Check web access works.

Stage Three - Command Line Interaction and ACL control

Implement coltex admin create NAME command which creates a new document repository called NAME from the template. This also creates a USER for Subversion who will be the admin for that repository with separate password (so other commands can be done via web later). Only a DICE user can create a new document repository. Also creates a web area for that document.

Implement ACL management commands:

coltex admin create USER - add a new user, generates a random password emails user with it?

coltex admin delete USER - removes the user

coltex admin update USER - change details (name, email, password)

The above all simply manage the Apache .htpasswd file used for the Subversion repository access (users are common to all). Any DICE user in group (coltex_admin) can use the above commands. Note that user accounts are not per-repository, the same user account can be used for all repositorries. However the access for each repository is controlled as below. The .htpasswd and other user data files are probably stored in a separate control subversion repository (with access by group coltex_admin, DICE only).

coltex admin access NAME - edit ACL's for access to repository

Th above is for manipulating the content of the svnserve.conf file used by mod_dav_svn to control repository and file access to users.

Only DICE users can create repository and manage users etc. but they can create any users for anywhere (do not need to sign regulations if not published beyond collaboration group, but DICE owner is moderator and responsible for content). Access to repository is username/password, and even for DICE users is a separate password. Users are common across all repositories but access controls are per-repository.

Would could support per-repository only access control or right down to per-file control? We use an svnserve.conf like access file as part of the mod_dav_svn module (see manual). However access control is a lot slower (through DAV) when it is per-file due to the multiple checks that need to be made.

For the above commands to work DICE users will either need access to the repository server or we will need to implement via DAV or better still could have a special repository for the purpose of managing the apache setup and access control?

Normal svn commands will work at this point, so check how well document building (plus error handling) works etc. Repository access will be via https:// and BasicAuth . Minimal access through web to svn repository in this way (read only) (use an XSLT stylesheet to make look nice).

*check requirement here - we are saying only DICE users can create and administer document repositories (add users etc) but anyone (who is a user added by the DICE repository creator) can collaborate on a specific document, managed via standard Apache BasicAuth and Subversion access control - is this sufficient?*

Stage Four - Command Line Wrappers

Implement coltex edit NAME FILE command which is a wrapper script to edit a .tex (or other) file within the NAME document repository. Loosely it does:

  • svn checkout into temp area (or svn update if already checked out)
  • start up users editor to change FILE
  • svn commit -m "blah" FILE back into repository

It is slightly more clever, if you choose a FILE that does not exist it can create it (prompted). It allows re-entry, ie. you can opt not to commit changes and use the same command to edit the file again later.

Importantly it also "normalizes" the LaTeX in the file before doing the commit (and before edit, which may result in changes even if none are made explicitly if the file has previously been changed via svn commands). This should help prevent merging problems and minimize differences between versions so they have less irrelevant noise/clutter. It would be nice to do this as a pre-commit hook script in subversion (so that normal svn commands will also silently normalize) however modifying a transaction via a hook script is a big NO NO apparently.

It will handle manual merge errors by silently actually adding the merge data as LaTeX markup into the document to show the conflict meta-annotated with user causing conflict and then just warn the user - they can then opt to go in and modify the markup directly or leave it for someone more experienced? This doesn't handle file being deleted underfoot however.

This command does not return any LaTeX build errors (which could be because of somebody elses changes), it just returns immediately after commit (build of output is a separate background process). This allows multiple edits without waiting on build each time.

Doing coltex test NAME will pend on a build progress and report successful or not, show log output if not or open document (and/or change document) if was. It can check the build progress and log by doing wget on the files generated by the post-commit hook script into the repositories web area. Once built the actual PDF document is retrieved by doing an "svn update". We need to be able to tell user whether a build errors is directly because of their changes or not (or not yet anyway). We can extract cached svn password to auth connection without bothering user for password.

We will have a coltex list NAME to list all the files in the document repository.

Implement coltex diff NAME FILE command which substitutes the default svn diff command with latex-diff. By default shows the difference between HEAD version and previous version?

Note for coltex commands NAME is optional if in a working directory, in which case applies to that working directory. Since these are temp. shorthand coltex goto NAME takes you there.

Obviously have coltex help and coltex help etc.

It is simplest to get all the functional aspects of the system worked out via command line access - this can then easily and quickly be translated to the web via CGI scripts.

Stage Five - Web Access

Add a minimal CGI front-end to allow files to be edited through browser text box (basically a web implementation of coltex edit). Add some administration CGI web pages (for coltex admin commands to do user and ACL management).

Also some basic web pages to show generated document (and change document) as well as release difference (highlighted latex-diff output for example) etc. Also could look at existing svn web interfaces for document management.

Also single click links to previous revisions and differences, this can possibly be achieved with an already out there subversion web interface.

Need to something here for file upload and for image upload, not quite sure of the details for that yet.

All coltex commands are simple shell script wrappers around subversion and file/directory management - should work on any UNIX system (will be available for download via the web repos system for collaborators who want to use instead of web tools). Of course normal svn commands will work as well, with full instructions on web.

Use a simple CMS type thing as a wrapper around the front-end (eg. Wordpress) for managing the overall web framework and online documentation. May want an integrated forum (eg. phpBB) for online collaboration.

Other Thoughts

PySVN seems a nice module - the commands could be scripted in Python with PySVN providing the Subversion interface - probably better than shell scripting.

-- TimColles - 20 Jul 2007

Topic revision: r3 - 07 Sep 2007 - 10:51:49 - TimColles
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies