Final report: Port LCFG to RHEL7 (296)

The Goal

The product was to port the LCFG configuration system to RHEL7 based platforms. We ported to the "inf" level - that is, we ported the entirety of the LCFG system that's shared across the University, adding as little Informatics configuration on top as was necessary to construct working machine configurations. Other projects would then build on this to produce complete working environments for desktops and for servers.

Major changes

Significant parts of LCFG have had to be altered, redesigned or replaced. The highlights of the work include:

Changes related to booting and to the control of daemons
(all of which is now under the control of systemd). This was a significant extra task not normally encountered in OS upgrade LCFG ports. The switch to systemd was a major, fundamental change.
Deploying the new lcfg-systemd component
Having learned (we thought) how systemd works and how it should be configured, and having written the lcfg-systemd component to impose LCFG control on its configuration, we then had to find out the hard way that practical systemd use is not the same as the theory, and that our understanding of systemd configuration had to be yet more detailed and also more shaped by our experiences. There was a long process of repeated rethinks and adjustments before a satisfactory working system was achieved. Like most work related to systemd, this was time-consuming and would not normally be expected as part of an LCFG port. See the blog for details.
Shifting the main porting target OS several times
We worked with Fedoras 19 and 20, Red Hat Enterprise Linux beta and Scientific Linux 7. At each stage of the process, the OS we were porting to was the best obtainable approximation of the target operating system, which was Scientific Linux 7 or equivalent. We started the porting project in good time for the expected release date of RHEL7 and SL7, in order to get the port done as quickly as possible after the final OS was released. At that point we still expected this to happen in time for SL7 DICE to be deployed for the 2014-15 session, but only just: time was expected to be tight, so we tried to get as much as possible done before the release of the final operating system. Had we known that RHEL7 would ship months later than expected, and that the creation of the SL7 clone distribution of RHEL7 would also take significantly longer than previous ones had, we would perhaps have been able to start some months later and to eliminate some of the shifts of porting target. However, there was no way for us to know the release dates in advance.
Reworking component start ordering
and creating LCFG-specific systemd targets - made necessary by systemd. See the blog for details.
Support for grub2 instead of grub
The switch from grub to grub2 was another major technology shift, of a severity not often encountered in OS upgrade projects, and it added several weeks of extra work to the project. See the blog for details: 18/4/2014 and 18/2/2015.
Adding a "service" method
for components to start, stop and otherwise control daemons on all platforms. This was another consequence of the move to systemd, rather than a normal part of a port to a new OS. See the blog for details: 27/6/2014 and 5/3/2015.
OpenAFS client changes
made necessary by the move to systemd. The component has been split into a separate openafs_client component which does not stop or start the client daemon. See the blog for details.
Significant work on LCFG headers and package lists
Any port of LCFG to a new OS version means editing many headers and creating a number of new package lists, but the move to systemd meant that this work had to be a great deal more thorough than normal. In particular we needed a major redesign of the local package list structure. The disappearance of the "boot" component was another major driver of this work.
Changes to build dependencies
for example caused by the formerly monolithic Perl package having been split into many smaller optional packages. Some degree of change (of dependencies) is inevitable in every major OS upgrade, but this upgrade brought significantly more change than we had encountered in previous OS upgrade projects.
Adopting a new approach to mirroring the EPEL repository
We had to do this to adjust to changes brought in for EL7.
A facility for daily calls of components and other software
to replace boot.run. This work was necessitated by the move to systemd. Details are at TaskRunner
Getting graphical logins working with GDM
If we're lucky the existing graphical login configuration can be re-used for a new OS with minimal effort. In this project we were not lucky, as our SL6 graphical login configuration used KDM, which wasn't available for SL7. We came up with basic local configuration for GDM graphical login screens.
A new lcfg-dconf component
to manage GDM and other GNOME software. This was needed because of another major technology change in RHEL7, from GNOME 2 to GNOME 3, which brought with it a change in the underlying GNOME configuration technology from gconf to dconf.
Changing graphical logins from GDM to LightDM
including creating a new lcfg-lightdm component. Having tried GDM we found it unsatisfactory in a number of ways, but the one which persuaded us to replace it with LightDM was GDM's tendency to confuse users into entering their passwords into the (visibly readable) username field of the login screen, and Red Hat's apparent lack of urgency in fixing this problem. See the blog for details.
LCFG-controlled dynamic image generation
for login screens. This provides a configurable way to generate images for the LightDM greeter window, and indeed for other purposes. We needed an LCFG-controlled way to configure greeter images, and neither LightDM nor GDM provides much in the way of support for this. This was another development made necessary by the lack of KDM.
Adding support to the package management tools
Porting lcfg-sleep
to provide the LCFG-controlled desktop sleep/wake facility. This would always have to be done as part of the port of LCFG to a new OS version, but in this case it took longer than usual because of the move to systemd, as OS facilities used by the component have changed.
Extending LCFG's system information framework
to describe general features of a system. This was prompted by a need in EL7 but will be generally useful.
Porting the LCFG install mechanism
This is a standard part of an LCFG port, but the move to systemd necessitated a deeper than usual reexamination of the install mechanism, taking extra time. We added new support for running install scripts in the newly installed installroot - see the blog for details.
Porting the majority of MPU components used on desktop machines
and configuring these for EL7. A standard, known part of an LCFG port.
Rewriting several components into Perl
for example lcfg-hardware. This was opportunistic preventative maintenance; we intend to phase out shell components at the next OS upgrade, so it made sense to convert some of them now.
Adding support to Package Forge for the new OS
This is a known and standard part of an OS upgrade LCFG port.
Porting LCFG Build Tools
another standard component of a port of LCFG to an upgraded OS.

Effort

Period Hours
2014 T1 244
2014 T2 467
2014 T3 375
2015 T1 366
Total 1452

We estimated ten weeks of effort for the project - slightly more than the 334 hours it took to port LCFG to SL6. The final total will be about four times that estimate. The technical changes needed were a good deal more profound than we had expected. Many details of the LCFG ecosystem have had to be revisited and redesigned or replaced.

Lessons learnt

Normally in this section we would try to identify things which we would do differently if we had the chance to do the project again. Since this project took massively more effort than had originally been expected there ought to be lessons to learn from it. However it's difficult to identify very much that could have been done very differently.
  • With the benefit of hindsight we would have started the project somewhat later. Both RHEL 7 and SL7 were released considerably later than we had expected. Our attempts before their release to aim for what the port would probably be led us to port to the then current Fedora release. When that was replaced by another Fedora release (and RHEL7 still hadn't appeared) we had to shift our porting target to the newer Fedora. However neither hindsight nor the RHEL7 release schedule was available at the time.
  • It's clear that we massively underestimated the amount of effort needed to adjust LCFG to the world of systemd (home, wikipedia). It's profoundly different from earlier Linux startup mechanisms, not just in the way it starts processes but also because it seeks to control many aspects of their lifecycles, and because it replaces many formerly separate system services.
  • In the last few years we repeatedly considered attempting a port of LCFG to the then latest release of Fedora. The idea was to familiarise ourselves with upcoming technologies before they hit RHEL and Scientific Linux, and to do some advance adjustment of LCFG. However we rejected this approach each time it came up, preferring to concentrate on more immediately required work. If we had been able to make the time earlier - a big if, given that (for instance) the LCFG server and client needed major work before they could be ported to any new OS - then some of the adjustment to systemd might have been done then. However it would still have had to be done.
Edit | Attach | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r14 - 23 Apr 2015 - 10:59:17 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies