Final Report : remove obsolete deps for LCFG client (274)

The Goal

The primary goal for this project was to remove the LCFG client dependency on obsolete W3C::SAX XML modules by switching to the XML::LibXML module which is based on libxml. The plan was to create a new object-oriented API for accessing and manipulating the profile/component/resource and packages information for an LCFG client profile. The intention being that, whilst initially the API would only be used for the rdxprof client process, it would be designed in such a way that it could be used by all LCFG software which needs to be able to read and write profile data in the various supported formats (both client and server side). Reducing the numerous separate implementations to a single code base would mean that as well as improving maintainability this would open the way to future new feature development since any new features would only need to be implemented once.

The Project

The project has taken nearly 4 years to reach completion. Although a lot of effort was involved mostly it took a long time because many other higher priority projects took precedence (e.g. the SL7 platform and server upgrades). The project was mostly achieved in 3 distinct phases:

  1. Prototype : December 2013 to February 2014. A prototype Perl module was rapidly developed for parsing LCFG XML profiles. This used XML::LibXML and Moose (later replaced with Moo). Some interesting ideas were investigated including that of storing all the components/resources for the current profile in memory with all possible context states to allow rapid querying and context-change handling. This prototype was also used to test the idea of representing as an object-tree the entire set of differences between profiles and using a change handler module to calculate the necessary response for any changes. Although reasonably successful in terms of the original requirements this prototype proved to be slow and very memory heavy (50 to 100MB for an average profile) compared to the v3 client. As a quick test a simple (and not feature complete) parser for the LCFG XML profiles was written in C using the libxml library, this proved to be much, much faster and less memory intensive so a decision was taken to do the XML parsing in C but keep most other functionality at the Perl level. The results of the investigations during this period were very useful when it came to designing the final products but it is likely that very little of the actual prototype code survived.

  1. Consolidation : September 2015 to October 2016. After a long period of inactivity due to the SL7 platform work a fresh look was taken at the goals for this project. The idea of extending the code beyond just the needs of the LCFG client emerged with the vision of a single set of core libraries which all client (and eventually server) code could use for processing LCFG profiles. The idea of holding all component/resource data in memory for all possible context states was dropped as it was still too memory intensive and the approach was switched to the simpler method of evaluating the contexts as the XML profile was processed. During this period of work all the basic functionality necessary to run the client rdxprof tool was completed and most of those APIs have changed little through to the end of the project. A new change handler was written and a first attempt was made at porting the client to the new libraries. At this stage the core libraries could be used to parse most Informatics profiles except for those with resources that had complex context expressions.

  1. Completion : March 2017 to November 2017. The work on this project was further interrupted by SL7 server upgrades. On return a concerted effort was made to complete the support for the client, in particular a new context expression parser was written using flex and bison. The APIs for the C libraries were fully documented using doxygen and many minor improvements were made to make the code clearer and more consistent. The libraries were fully tested using LCFG profiles from both Informatics and the wider community which led to the detection and resolution of many small bugs. The code was extensively checked for memory leaks using tools such as valgrind and it was submitted to the Coverity static analysis site which proved to be very helpful for improving the code quality. To ensure a reasonable level of platform independence the code was built and tested on both Redhat and Debian systems. As a bonus a new high-level client-side Perl LCFG::Client::Resources module was created to leverage the functionality of the new core libraries and the core tools were ported to this module. This proved to be a useful exercise in checking how easy it was to work with the new APIs which led to many minor improvements.

Along the way I blogged about various topics which can be found via the lcfg-client tag.

The Final Products

This project has produced new C and Perl core libraries and the LCFG client has been updated to use those new libraries:

1. Core C libraries

This provides most of the functionality required to read and write LCFG profiles from the various source formats, it covers both components/resources and packages. The intention is that wherever possible the core functionality is in this layer so that it could be reused from languages other than Perl (e.g. python).

It is split into the following sections:

Support for reading/writing component data from/to Berkeley DB files. This is the format used by the LCFG client to store profiles once they are imported from XML.
Support for reading profiles (both components and packages) from the LCFG XML file generated by the LCFG server. Note that there is currently no support for writing to this format.
Support for parsing LCFG context expressions
Support for manipulating LCFG package specifications and reading/writing package lists in various formats.
Support for manipulating LCFG resources and components.
Support for manipulating entire LCFG profiles (not much code in this, mostly just provides a high-level interface to other libraries)
Import of the old lcfg-utils, some improvements and new functions added.

The APIs for the libraries are documented using doxygen and the docs are available as html and PDF in the lcfg-core-doc package. The plan is to also make this available on the LCFG website. There is probably a need for some higher-level "programmers guides" to the various libraries to help people get started with developing against these libraries.

2. Core Perl libraries

Most LCFG code is written in Perl and thus the main interface to the new core code is via the Perl libraries. These are split into the following sections:

Object-oriented interface to a single LCFG resource.
Object-oriented interface to an LCFG component. Supports reading/writing from/to various file formats.
Object-oriented interface to a single LCFG package.
LCFG::Package::List and LCFG::Package::Set
Object-oriented interface to a collection of packages, either as an ordered list or a set. Supports reading/writing from/to various file formats.
Object-oriented interface to entire LCFG profile. Supports reading/writing from/to various file formats.
Object-oriented interface to the differences between two profiles, i.e. components added/removed/changed, resources added/removed/changed, there is no support for package diffs.

The Perl module APIs are documented in the standard way using POD. For simplicity, most of the testing is done at this level using the standard Test::More framework. Again there is a need for high-level "programmers guides" for these libraries.

3. Updated client

By switching to the new core libraries the client profile "build" system has been massively simplified. The client no longer requires any knowledge of how to parse XML profiles from the server or how to read and write the local Berkeley DB files.

Along with this a new "change handler" has been introduced which uses the diffs between the old and new profiles to decide what action is required (i.e. which components to configure).

The rdxprof tool has always supported the fetching and processing of profiles other than for the localhost but this has always worked more by luck than design. The new client has been modified to be very careful to only apply changes to the local system if the profile being processed is applicable.

A new LCFG::Client::Resources module has been introduced which can replace LCFG::Resources. By moving this code from the perl-LCFG-Utils package into the client package a major bootstrapping issue has been overcome. It also now slightly higher-level so it can use the client code to get the local node name which simplifies most situations where it is likely to be used.

The standard qxprof, sxprof and qxpack utilities have been ported to the new LCFG::Client::Resources module. A new whererpms tool has been introduced which can be used to search the current rpmpath for packages.

Other Benefits

Improvements to code quality
C is a lot less forgiving about vaguely defined behaviour than Perl so as part of the process of writing new C code many features needed to be better defined. This process also revealed a few bugs (that are now fixed) which could lead to corruption or misinterpretation of resource data on the client side.

Improvements to profile data quality
The testing phase uncovered many genuine bugs in various component schemas which are not being caught by the current LCFG compiler. In particular it has become clear that the resource tag validation in the LCFG server needs to be improved, of course, the hope would be that one day it will be converted to using the new core libraries.

Personal benefits
This project has given me many opportunities to enhance my programming knowledge and skills. I have gained a much deeper knowledge of the C programming language and have learnt a lot about combining C with Perl by using XS code. I have also learnt how to create parsers using the flex and bison tools. Along the way I have become familiar with tools such as valgrind which is used to check for memory leaks and the Coverity static analysis tool.

Bootstrapping improvements
Whilst porting some of the LCFG utilities (e.g. qxprof, qxpack) to the new libraries a number of circular build dependency issues were identified which make porting to new platforms difficult. To improve bootstrapping on new platforms (e.g. EL8) various scripts and Perl modules have been moved between packages.

General Thoughts

Writing documentation for a large API can be tedious but I also found it to be a very useful code review process. For example, once you notice that there are 4 different functions which can be used to serialise resource information in the various required formats you soon realise that the APIs should be consistent so that it is easy to swap from one to the other. It is also useful for spotting areas where common code can be shared and reused.

Writing a large library in isolation would not be a good idea. In this project the C and Perl libraries were developed in parallel, the Perl layer "uses" the functionality provided by the C library and this ensures that it has a sensible and easy to use API. Similarly the Perl libraries were much improved through the work of porting the standard qxprof, sxprof and qxpack utilities. I hadn't originally planned to port those tools but the work has proved to be very useful. In my opinion, creating a good API needs more than just an awareness of the original stated requirements, it needs real code which actually uses the functionality. That could be in the form of a test suite but tests are often only focussed on small parts of a library. Testing small chunks of code is important but porting a whole script gives the API a much more complete work out and provides an understanding of how others would approach using a new library.

The Future

In no particular order here are some thoughts on potential future work:

Programmers guides
It would be nice to have programmers guides for the core C and Perl libraries to help people get started.
Finish porting client Perl code which uses LCFG::Resources over to the new LCFG::Client::Resources module
Perl Components
Introduce a new LCFG::Client::Component framework to replace LCFG::Component. That would use the new APIs and also incorporate many other improvements and ideas which have been suggested over the years.
Port the LCFG server to using the libraries. It might be possible to do parts without doing a full rewrite (e.g. the package list handling). Doing this would allow us to start developing new server functionality again.
Improve the type handling for more power/flexibility. Currently it's not possible to do this all in the C layer, most validation is still done in the Perl layer. Adding new types is just done by limiting possible strings with perl regexps.
Switch component resource data structure from linked list to hash for greater speed. Doing this for the packages data structure proved to be a big improvement and it would probably be essential for porting the LCFG server.
Package lists
Add support for diffing lists of packages. Also there are various other features which are currently only in the pkglist-tools scripts which could be merged into the core libraries so they are easier to reuse.
Port updaterpms to the new libraries.
Introduce a python framework. It should be possible to wrap the C library with python modules.


Period Hours
2013 T3 45
2014 T1 90
2014 T2 22
2014 T3 0
2015 T1 49
2015 T2 34
2015 T3 178
2016 T1 156
2016 T2 216
2016 T3 58
2017 T1 148
2017 T2 256
2017 T3 20
TOTAL 1272

182 days / 37 weeks effort.

-- StephenQuinney - 23 Nov 2017

Edit | Attach | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 08 Dec 2017 - 09:15:47 - StephenQuinney
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies