Final Report for the Software Build Farm Project (#115)

A full description of the background, original motivation and design overview for the project to create a software build farm for Informatics is provided at: The details of the original plan are also available at devproj#115

Further to this, online documentation is provided for users, administrators and programmers at

The original dates in the project plan slipped quite a bit, this was mainly due to other higher priority projects getting in the way. In particular the SL6 porting project took up quite a bit of my time.

Approximately 51 days of effort was expended on this project which is much greater than the original estimate of 4 weeks. There is no doubt that some of the extra effort required was due to an underestimation of the complexity of the code required. Over 6000 lines of code have been written, a lot of this is due to having to create frameworks to help with running code as unix daemons, building packages with mock and accessing the database.

At the start of the project there was already a rough prototype for a large part of the system. Although the final high-level design of the system did not differ substantially from that of the prototype I definitely underestimated how much of it would need to be thrown away and rewritten. As the development process went along it became clear that a number of areas needed a lot more thought, particularly getting the build daemons working well took much more effort than predicted.

In the process of developing this project I have learnt a lot more about using the Moose object-oriented programming framework, the DBIx::Class database access framework and the Catalyst web framework. I expect that most of this knowledge will be reusable for future projects, there is certainly overlap between some of this and the LCFG server refactoring project code requirements.

A reasonable amount of effort was spent ensuring that the code was generic enough to be used by external users and also potentially extensible enough to handle different source formats and different build platforms (e.g MacOSX). The lcfg-pkgsubmit component was created for this project and the lcfg-mock component was substantially improved.

The documentation took more effort than I originally predicted. Particularly a lot of time was spent ensuring all the administration aspects were covered. In the future I would allow more time in the plan for a large project to ensure the documentation gets plenty of attention.

All milestones have been completed except the final one (number 12). I propose to convert this final milestone - "Add basic passive nagios monitoring for the daemons" - into an MPU small project which will be done in T2 or T3 of 2011. This will allow me to move onto other high-priority projects and come back to this lower priority issue later. The system has proved to be very reliable, the only issue seen so far with the master server not handling short network drop-outs has now been resolved. This is also not a critical part of the infrastructure we provide to our users so being unavailable overnight or over a weekend is not a big problem.

The system has been running since the beginning of February 2011 and has been through a period of user acceptance. It has been used extensively, and with great success, for the LCFG SL6 porting project. It has also undergone some "stress testing" thanks to RAT and their rather large matlab packages. The stress tests revealed some scalability issues, especially, the AFS caches were far too small on all the PkgForge servers. The service is now considered to be fully stable and reliable.

There is no doubt that more features could be added to this system. The main area which requires some attention is the web interface which is functional but lacking in some features. It would be nice to be able to view live job activity, manage tasks (cancel and reschedule, for instance) and upload jobs (for users without AFS access). A wishlist will be maintained at:

I suggest that in a year's time the list should be reviewed with the possibility considered of a follow-up project to fix any bugs found and add any high-priority features.

-- StephenQuinney - 13 May 2011

Topic revision: r1 - 13 May 2011 - 10:26:27 - StephenQuinney
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies