System Monitoring

Overview

In addition to producing the LCFG monitoring service, the monitoring project also produce the Informatics Jabber system, the apacheconf LCFG component, and drove the development of the remctl remote command execution service.

Code

The monitoring project produced a substantial quantity of new code

  • lcfg-monitor (7621 lines) provides the system-independent framework on which the Nagios-specific portions of th eystem is built. Much of the code here was developed to deal with problems with the current LCFG implementation, in particular its handling of spanning maps. When the LCFG server is reimplemented, much of this component could be reconsidered. In addition, the package contains a number of modules which may be of more general use:
    • LCFG::XML::Parser - an XML::Parser based LCFG profile parser, that is at least an order of magnitude faster than the one currently used by LCFG
    • LCFG::Monitoring::Template - an object-oriented LCFG templating engine, built on top of Template::Toolkit. Unlike the current LCFG templating engine, this allows direct access to the underlying Template::Toolkit options, so allowing the disabling of interpolation, and case-insensitivity, which is essential to handling some configuration file types.
    • LCFG::Monitoring::LocalProfile - an object-oriented interface to a machine's local LCFG resources

  • lcfg-nagios (11356 lines) contains the portions of the monitoring system that are specific to the Nagios engine, along with the lcfg component definitions to manage Nagios clients and servers

  • lcfg-jnotify (1094 lines) contains the system which performs Jabber based notifications. This is comprised of a Python Twisted daemon which is permanently connected to our Jabber server, and monitors the connectivity of all of the users in its roster. The daemon then accepts connections from the monitoring system on a local Unix socket, and forwards them on to users based on that user's presence setting.

  • python-gss (1034 lines of code). A locally developed Python wrapper for the GSSAPI library, this provides GSSAPI client functionality for lcfg-jnotify. At the moment this library only provides client-side bindings, although it is planned to implement server-side commands as required by other projects.

  • lcfg-remctl (514 lines of code). Roger performed the initial development of this component for configuring the remctl daemon which is used for passive monitoring. Later, the component was reimplemented in perl, and reworked so that remctl itself is managed by xinetd.

  • lcfg-jabberd (2205 lines of code). The LCFG component to manage a Jabberd2 service was originally developed as part of this project

  • lcfg-apacheconf. Development of the new Apache configuration component was shared between this and the iFriend project.

In addition, monitoring for a number of components were developed as a proof of concept:

  • nagios_server
  • openssh
  • apacheconf
  • kerberos
  • openldap
  • jabberd

Configuration

The monitoring system is configured entirely by LCFG - using the option headers

  • dice/options/nagios_server.h - for servers
  • dice/options/nagios_client.h - for clients (machines being monitored)
  • dice/options/naigos_packages.h - for development hosts

The Jabber system is also configured by LCFG

  • live/jabber-server.h is the primary entry point for the Informatics Jabber server
  • dice/options/jabber-server.h provides support for running a Jabber service for other domains.

remctl is LCFG managed

  • dice/options/remctl.h configures a remctl server

Authentication

Jabberd

Connections to jabber over an SSL encrypted link may be authenticated either by password, or Kerberos. Non SSL connections may only be Kerberos authenticated.

Nagios

  • Connections to the Nagios frontend are authenticated through cosign. iFriend access is not permitted.
  • Connections between the monitoring systems use remctl, which is Kerberos authenticated
  • The Nagios->Jabber link is Kerberos authentication
  • The Nagios->email link is not authenticated

Backups

  • Nagios contains no data of long-term value - and as such no backup arrangements are required.
  • The sqlite Jabber database is backed up via rsync to a remote host, and then to tape.

Service Restoration

Simply bringing up a new machine with the nagios_server.h header included, and switching the nagios.inf.ed.ac.uk CNAME to it will restore the Nagios service.

For the Jabber service, bringing up a new machine with the live/jabber-server.h header included, and running om jabberd restore /path/to/backup/of/database is sufficient to restore the system.

Monitoring

The two monitoring services, nagios and nagios2, monitor each other by using the passive monitoring support provided through the remctl component Jabberd is monitored via the monitoring system

Dependencies

  • Whilst the monitoring system has no code level dependencies on the rest of LCFG (the LCFG codebase is insufficiently flexible to permit reuse), it is heavily dependent on the behaviour of the LCFG servers. Any changes in LCFG server behaviour, profile serving, or spanning map computation may have effects upon the monitoring system.
  • The web frontend depends on cosign and apacheconf
  • Passive monitoring (including monitoring of Nagios itself) depends on remctl and Kerberos
  • remctl depends upon the xinetd component for its invocation, and uses the etcservices and tcpwrappers component to configure the machine
  • Jabberd depends upon the Kerberos and X509 components to provide it with its key material.

Documentation

Full documentation for the monitoring system is available from http://www.dice.inf.ed.ac.uk/units/infrastructure/Documentation/Monitoring

A number of other people have developed translators based on this documentation.

Topic revision: r1 - 28 Nov 2007 - 10:52:13 - SimonWilkinson
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies