LCFG Check Scripts

This page describes the inner workings of the LCFG check scripts which are used to generate the LCFG status reports - https://lcfg-checks.inf.ed.ac.uk/reports/

The code and configuration are in the dice-check project which is in the dice subversion repository.

Documentation

The main docs are in the check script and the Perl module. They can be queried on the LCFG master server like this:

/usr/libexec/dice-check/check --help

and:

perldoc DICE::Check

In particular, details of all the existing check methods are given in the docs for the Perl module.

Script

The process of running tests on a set of machines is driven using the check script (in scripts/check.cin) which takes an INI-style configuration file (e.g. conf/labcheck.ini). Typically it is run from cron using k5start as GSSAPI authentication is required to access data in the inventory. It requires direct access to various data sources (e.g. LCFG profiles and stable release) which are stored on the LCFG master so it's difficult to run elsewhere.

Configuration

The process of checking a collection of machines is controlled using an INI-style configuration file, for example conf/labcheck.ini, which is found in the /etc/dice-check directory on the LCFG master server. Changes should normally be made to the copy in the project svn directory, only change the config directly on the LCFG master if it's not possible to easily/quickly make a new package.

The checks are done on all machines which are found in the inventory based on a list of rooms. The list of rooms to be checked are specified in a separate text file (e.g. conf/lab-rooms.txt). Basic wildcard matching is supported using the % (percent) symbol (e.g. IF-2% would match all machines on level 2 of the Forum). It's also possible to exclude specific rooms using a ! (exclamation mark) symbol as a prefix (e.g. !IF-2.09), note that there is no support for wildcards for that purpose. Comments are supported with a # (hash) prefix.

Checks for individual machines can be disabled by defining the DICE_NO_CHECK macro somewhere in the LCFG profile.

The first part of the INI configuration (the defaults section) specifies the rooms list and the template to be used to generate the final report. It looks something like:

[defaults]
module=DICE::Check
template=labcheck.tt
rooms=lab-rooms.txt

If the path to the rooms is not absolute then it is expected to be in the standard configuration directory (e.g. /etc/dice-check/lab-rooms.txt). The templates are found in /usr/share/dice-check/templates

The output will be sent to stdout unless the outfile parameter is set to a filename (e.g. /var/www/lcfg-checks/reports/index.html)

Some of the defaults can be overridden on the command line, which is useful for manually testing new config or templates, for example:

/usr/libexec/dice-check/check --conf /etc/dice-check/labcheck.ini --template labcheck.html.tt --outfile /var/www/lcfg-checks/reports/index.html

Tests

Tests are enabled in the INI configuration by specifying a Check block with all necessary parameters. For example:

[Check ssh]
desc=SSH service is listening
requires=is_dice
timeout=3
identifier=SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.2

The name of the test must match with a subroutine in the DICE::Check module (lib/DICE/Check.pm.cin), for example check_ssh. Or alternatively the method name may be specified like this where the check_report_entry method is actually being called.

[Check mouse]
desc=Is there a mouse attached?
requires=has_report
method=check_report_entry
cmp= ==
key=peripherals.has_mouse
expected=1
message=Mouse is missing

The desc field is a short description of the test.

The requires parameter is used to list other tests which must pass for this test to be run. Usually these will be one or more of has_profile, has_report and is_dice. The main benefit is that the reports will be clearer as they will not be full of additional errors if a machine is missing a profile or is not actually dice.

The following tests are available:

  • check_hw_header - check that the hardware header is correct (based on data in conf/models.yml)
  • check_is_dice - check that the machine has a dice profile
  • check_kernel - check the kernel version
  • check_lab_header - check if the profile includes the correct lab header
  • check_last_report - check when the most recent clientreport was submitted
  • check_lcfg_release - check if the machine is on the most recent LCFG weekly stable release
  • check_monitors - check the configuration of the monitors
  • check_network - check the network speed
  • check_office_header - check if the profile includes the correct office header
  • check_os - check the machine is running the expected OS (e.g. Ubuntu Focal)
  • check_os_header - check if the profile includes the correct OS header
  • check_partitions - check the disk partition layout
  • check_pkgupdates - check that there were no errors when updating packages
  • check_report_entry - A generic check which can be used to verify entries in the client report
  • check_ssh - check that the SSH daemon is listening
  • check_virt - check that virtualisation is enabled in the BIOS

Templates

The template is processed using the Perl Template Toolkit. The following parameters are passed in:

  • by_location - A Set::Scalar of host names with problems grouped by the location as the key (e.g. IF-2.09)
  • by_key - A Set::Scalar of host names with problems grouped by the failed check as the key (e.g. has_profile)
  • by_group - A Set::Scalar of host names with problems grouped by the LCFG group (as used on the status pages) as the key (e.g. DICE/Servers/Managed Platforms Servers)
  • by_host - A simple hash of problems keyed on the host name, the associated value is a list of problems found.
  • room_info - A hash of information for the rooms checked (mostly just a count of errors per room)
  • warnings - A list of general (not host-specific) warnings (things like rooms which appear to contain no machines)

The best reference for writing a template is the existing ones for the lab checks - labcheck.tt generates a simple text file and labcheck.html.tt generates an html web page.

See metacpan for information on the Set::Scalar module. Note, in particular, that the elements method must be used to get a list of all the items in the set.

Code

Each test must have a subroutine in the DICE::Check Perl module. The test is called with a standard list of arguments which provide access to data from the inventory, client report, lcfg profile and any parameters in the configuration block.

A test subroutine would be specified something like this:

sub check_ssh {
    my $class = shift @_;
    my ( $inv, $report, $profile, $params ) = @_;

The subroutine is expected to return a boolean value and a list of any explanatory messages, like this:

    return ( $ok, @message );

The inventory data is a reference to a simple hash which can be accessed like this:

    my $host = $inv->{hostname};

The profile data is a reference to a simple list which can be accessed like this:

for my $line (@{$profile}) {
   ....
}

Values for the parameters specified in the configuration file can be accessed using a helper method like this:

    my ($timeout)     = get_param_value( $params, 'timeout', 3 );
    my ($expected_id) = get_param_value( $params, 'identifier', undef );

The third argument is a default value.

The client report data can be accessed using a helper method like this:

    my $virt_enabled = get_report_data( $report, qw(os virt_enabled) );

The second parameter is a list of keys, in this example the value for the os.virt_enabled key is being retrieved.

-- StephenQuinney - 28 Apr 2021

Topic revision: r4 - 09 Jun 2021 - 08:46:00 - StephenQuinney
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies