This document attempts to give a comprehensive overview of the LDAP setup in Informatics.
Software
We use
OpenLDAP on all our DICE machines. We build our own openldap RPMs. We don't make an effort to
keep up with current openldap releases on standard DICE client machines releases. On servers, our upgrade policy is:
- We aim to keep up to date with the latest OpenLDAP release (give or take a version or two)
- With new releases we deploy to one slave first, for testing, then gradually to the other ones. Once a release has proven to be reliable, we deploy on the master.
The openldap daemon is called
slapd
.
We produce the following RPMs:
- openldap
- openldap-libs
- openldap-server
- openldap-debuginfo
And for local configuration:
For openldap's underlying database, we use the provided
mdb format. In the past we used bdb, which we built and distributed
ourselves.
Server setup and configuration
Our account management system Prometheus is responsible for populating
the LDAP tree with user, group and netgroup information. See
PrometheusOverview for more details. The prometheus flow diagram shows
which parts of LDAP are synchronised from Prometheus.
The following gives a brief summary of the main branches of the LDAP
tree, what they represent and where the data comes from:
- ou=AutofsMaps - autofs automount maps. Updated on the rfe server (currently
danio
) by /usr/bin/ldapBuildAutofsMap
.
- ou=Capabilities - for authorisation - groupOfNames objects with lists of users who possess that capability. Managed by Prometheus.
- ou=Group - posixGroup objects to provide group name and gid mapping. Managed by Prometheus.
- ou=Identities - unused, but might be in the future.
- ou=Maps - (historic) amd map information. Kept up to date by manual runs of ldapBuildAmdMaps
- ou=Netgroup - for authorisation - nisNetgroup objects with lists of users/hostnames (there are a handful of host-specific netgroups). Managed by Prometheus.
- ou=Partitions - (historic) NFS partition information. Used by ldapBuildAmdMaps. Kept up to date by ldappartsync on (currently) danio
- ou=People - user account information (rfc2307). Managed by Prometheus.
- ou=rfeMaps - rfe map data. Kept up to date by ldaprfemapsync on (currently) danio
Master server
There is one master server (currently
polly
) sited in the Forum server room. All updates have to be made to the master.
Disk setup
We configure separate disk partitions for the following:
- /var/openldap-data - the openldap data directory
- /var/openldap-snapshot - snapshots of the openldap database
Configuration
The master server is configured by LCFG resources via the
<dice/options/openldap-server-common.h>
header. Specific
configuration is controlled with appropriate
#define
statements, which
can bring in other header files - consult the header for more
information.
The LDAP schema is installed on all machines by RPM
(openldap-schema). In order to make a schema change you must ensure
that all machines have updated to a new version of the RPM before
making a change to LDAP data that uses any aspect of the new schema.
Access control
There are no filter holes for the master - so there is no visibility
from outside the Informatics network.
Reads are permitted for all (both authenticated and anonymous).
Writes are permitted from:
- users who possess the
ldap/write
entitlement (essentially sysmans)
- prometheus master server principal (
prometheus/fqdn.of.server@INF.ED.AC.UK
)
Backups
An hourly cron job runs
om openldap save
. This uses slapcat to dump an
LDIF file of the full openldap database to
/var/openldap-snapshot
.
The master keeps three months of backups. This partition is rsynced
nightly to a mirror server (currently to
maunsell
). Also note that the
slaves should always have a full copy of the ldap directory.
Slave servers
There are currently three site slaves -
nelson
(IF),
campbell
(AT) and
klein
(KB). These slaves are kept in sync with the master server via
openldap syncrepl technology - changes are pushed to the slaves as
soon as they happen on the master. There are also four "lightweight"
slaves (
damflask
,
hutter
,
redmires
and
schneider
), which are hosted on
virtual machines. The only functional difference between site slaves
and lightweight slaves (other than the former being physical and the
latter being virtual) are:
- site slaves keep more backups (see below).
- site slaves have the openldap disk partitions as the master, lightweight slaves have the system default
All slaves are configured using the
<dice/options/openldap-server-common.h>
header with appropriate
#define
statements.
Backups
Backups are made to
/var/openldap-snapshot
, as on the master, but are
not rsynced anywhere. Lightweight slaves keep one day of backups,
site slaves keep one month.
TLS
All slaves are configured with TLS. This is done via the inclusion
(via the common header) of
<dice/options/openldap-tls-server.h>
. This
uses the
lcfg-x509
component to acquire a locally-signed certificate.
Access control
We restrict access to slapd via our firewall to 'edlan', 'edlan172'
and 'tardis', as defined in
<live/ipfilter.h>
.
We use tcpwrappers to restrict access to:
- EdLAN:
- 129.215.0.0/255.255.0.0
- 192.168.0.0/255.255.0.0
- 172.16.0.0/255.240.0.0
- [2001:630:3c1::]/48
- TARDIS:
- 193.62.81.0/255.255.255.0
In openldap ACLs:
We allow access to
ou=People
for everyone.
We allow access to the rest of the tree for
- authenticated users
- localhost
- those from 'inf.ed.ac.uk' (via a DNS reverse lookup)
This is configured from
<dice/options/openldap-edlan-acls.h>
, included via the common server header.
We make data visible to EdLAN for Virtual DICE.
Monitoring
The LDAP services on the master and site slave servers is monitored
via Nagios - the configuration for this is in
openldap-server-common.h
Agents
We no longer have agents which update LDAP data. This description is included for historical reasons, and in case we decide to use agents again ...
One thing that is worthy of note is our use of agents, for jobs which
update LDAP data. A good example is the syncdbldap
script which
currently runs every day on greenford
to sync changes from the school
database into LDAP. This job is given the appropriate permissions for
making changes by an agent in LDAP called ldapsyncagent
, which has the
krbName attribute of ldapsync/greenford.inf.ed.ac.uk
- a cron job on
greenford
authenticates (using a keytab) as this identity before
running syndbldap
. The agent has a role of ldapuseradmin
, which is
what gives it the appropriate capabilities to make changes to those
areas of LDAP (as defined by the acls set on the master). There are a
small number of these agents in use.
IPv6
All of the LDAP servers have IPv6 addresses.
DNS
There is a round-robin DNS alias -
dir.inf.ed.ac.uk
- which consists
of all the slaves, both IPv4 and IPv6.
There is a separate, IPv4 only, alias:
dirv4.inf.ed.ac.uk
, which has IPv4 addresses for all the slaves.
The
_ldap._tcp.inf.ed.ac.uk
SRV record is also available - this is
used by
rfe
and
sssd
(see below).
There is also a separate SRV record used by autofs:
_ldap._tcp.mapdir.inf.ed.ac.uk
. The use of a separate record dates
back to when we were running separate LDAP servers for autofs maps
(for historical reasons).
Other slave servers
There are other machines which are configured to be openldap syncrepl
slaves, but which are not part of the LDAP service. These are
typically infrastructure machines and are configured in this way so as
to have no network dependency, in the event of network unavailability.
They run (and use) their own LDAP server and are configured via the
header =<dice/options/openldap-run-and-use-local-server.h>=/
slaprepl
DICE machines used to run their own LDAP servers, replicating hourly
from the master using a locally written
slaprepl
tool. This is now
deprecated in favour of
sssd
, as detailed below. It is still possible
to configure a machine in this way, however (see
openldap-server-common.h
for details).
DICE clients
sssd
All DICE clients use
sssd. This provides a secure (TLS)
interface with the seven LDAP slave servers for user, group and
netgroup information. It is configured through the
lcfg-sssd
component (which is built on top of the
lcfg-inifile
component) and the
<dice/options/sssd.h> header. It provides failover (should the LDAP
server to which it is connected become unavailable) and caching.
Note that
sssd
currently defaults to establishing connections via IPv4
only.
sssd client troubleshooting
sssd
is managed by systemd, so the standard systemd tools can be used,
e.g.:
-
systemctl status sssd
to check the status of sssd
The
lcfg-sssd
component writes the
/etc/sssd/sssd.conf
file and
starts/restarts
sssd
as necessary.
To completely clear the
sssd
cache (as root):
-
systemctl stop sssd
-
rm -f /var/lib/sss/db/*
-
systemctl start sssd
autofs
DICE clients also communicate with LDAP slave servers to obtain map
information for autofs. This does not currently use TLS, but really
should.
Disaster Recovery
For this purpose, we assume that a disaster involves the unavailability of the master server (unavailability of slaves can be solved by moving dns aliases and srv records around). This section addresses the steps required to move the LDAP master to a different machine. In situations where the master has become unavailable, all slaves will continue to work normally, but will not receive updates.
Note that
klein
, at KB is the designated disaster recovery machine for LDAP.
The following steps should be undertaken to move the LDAP master to a different machine:
- Get a copy of the last (good) LDIF backup from either the snapshot directory on the current master, or from the mirror
- On the new master machine, use
om openldap load
to load the LDIF file. Note that this may not be necessary if you're promoting a slave to master, as it may already have an up to date replicated copy of the master data. In some respects this may depend on the nature of the disaster (e.g. if data corruption is suspected).
- Give the new server the appropriate LCFG headers. This is likely to include at least
<dice/options/openldap-server-common.h>
with appropriate #define
statements. Check the profile in case there's anything which should be in a header, but isn't.
- Check (and transfer) any DNS aliases (e.g.
ldap
)
- Check that slave servers receive any updates.
- Check that prometheus updates are working
Useful Documentation
--
TobyBlake - 12 Mar 2019