Emergency Software Updates Howto

If a critical security issue has been announced the fix should be installed onto DICE machines as soon as possible, this can be achieved in the following way. When this affects most DICE machines this will typically be handled by MPU but it can be done by any CO.

Get the packages

The fixed software needs to be available as an RPM for installation onto DICE machines. If there has already been a security announcement from the upstream provider (e.g. Scientific Linux) then this should just involve grabbing the updates in the standard way (see MPUOsUpdates and talk to MPU). If they are not available from the primary provider then it is worth trying other distributions which are based on the same packages (e.g. RHEL or Centos), in that case the SRPM will need rebuilding locally. Only if there are no compatible source packages available is it worth manually patching the SRPM ourselves.

When rebuilding locally (either from a compatible source or manually patched) the revision field in the specfile should be altered (e.g. to add a .inf) so that the locally built packages are easily identifiable. This also makes it easy to replace the locally built packages with those from upstream once they become available.

Update package lists

The updates should first be installed and tested on machines following the develop release. If the packages come from the upstream provider then this is just a case of altering the updates package lists (e.g. lcfg_sl64_updates.rpms and lcfg_sl64_64_updates.rpms). If the packages are from any other source they should be added via the override lists (e.g. lcfg_sl64_override.rpms and lcfg_sl64_64_override.rpms). This makes the source of the updates clear and simplifies the process of switching over to those provided by upstream once they are available. Occasionally other headers and package lists will need altering (e.g. ed_sl6_env.rpms), this is because they do not have updates applied automatically.

All RPMs generated from the SRPM should be listed with a ? (question mark) prefix, for example:

/* Emergency SSL update on 2014/04/08 - locally built by Informatics
                  from the Centos packages */

?i686/openssl-1.0.1e-16.el6.7
?openssl-1.0.1e-16.el6.7
?openssl-debuginfo-1.0.1e-16.el6.7
?openssl-devel-1.0.1e-16.el6.7
?openssl-perl-1.0.1e-16.el6.7
?openssl-static-1.0.1e-16.el6.7

This avoids conflicts and makes the list safe for application on all machines.

Please add a descriptive comment to explain the changes along with the date on which the changes were made.

Do not forget that i686 architecture packages may be required on x86_64 machines (as in the example above).

Once you're happy that the change is correct and doesn't break anything it can be applied to all machines in the testing and stable releases. This is done via the live override list for the platform (e.g. live_sl6_overrides.rpms). Note that there is only one package list for all architectures so some things may need protecting with cpp conditionals (e.g. by checking for ARCH_X86_64).

Coordinate with IS

Any changes made to lcfg and ed level package lists will eventually have an effect on downstream users of lcfg (e.g. IS). The changes should be coordinated with IS, this is normally done by talking to Kenny MacDonald. It will sometimes be necessary to hold back making a public announcement until IS have pulled the changes into the MDP layer and announced to the COs in other schools so that we do not unintentionally make other schools a target for attacks.

Identify affected machines

It may be necessary to identify machines which are particularly affected by the security issue. This can usually be done by poking around in the LCFG server dependency lists. For example, with the openssl security issue we identified DICE machines running apache with SSL by looking for those LCFG profiles which depend on apacheconf-ssl.h. The dependency list can be generated by using the /usr/lib/lcfg/server/utilities/dumpdeps utility on an LCFG slave server (e.g. lcfg1 or lcfg2).

Announce to users

Before restarting services (e.g. apache) it is usually a good idea to send a message to sys-announce (and also add it to the computing blog) to warn users of any potential disruption. However, it may be necessary to be slightly vague and withhold certain details to avoid drawing attention to high risk security holes before they are fixed, in that case full details should be sent out at a later date.

It may also be necessary to send out a message to encourage self-managed users to apply security fixes to their own machines.

-- StephenQuinney - 08 Apr 2014

Topic revision: r2 - 08 Apr 2014 - 15:25:52 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies