Cause of the recent printing problems

Recently the printing service within the School of Informatics has subject to several unplanned interruptions to service. Two separate problems have been observed on these occasions:

  1. When submitting a print job, a error message is printed out warning of a loop in the printcap for this printer. No printing jobs can be submitted.

  1. Print jobs can be submitted as normal. There is a high load on the print server.On closer inspection, this load is caused by frequent calls to LDAP (on the order of one every twos seconds or so) from LPD. Further investigation shows that the print server is sending jobs to itself rather than sending them to the printer thereby forming a tight loop.

Restarting the LPRng component solves both problems.

An obvious place to start the investigation of the problem was the printcap entry being used by the server. A print server uses one printcap entry for a given printer, a print client a different one since on the server the data to be printed is sent directly to the printer whereas o a client the data is sent to the print server. In Informatics, printcap entries are held in LDAP with the entry for a printer containing information to allow both a client and a server printcap entry to be generated. A locally written script called pcap-query is used to generate printcap entries from the LDAP data and will produce server or client printcap entries depending on which host it is run. Clients generate a printcap entry from LDAP each time they send a job to the printer but because of previous problems with LDAP, print servers generate a flat text version of the printcap data at regular intervals and use that to look up printcap entries. This is handled by the LPRng component. In addition, the component is used to regenerate the lpd.conf and lpd.perms files. If the server's printcap file contained a client's printcap entries, this would account for the behavior we were seeing. Unfortunately, each time this problem occurred, a check of the local printcap file showed that, as expected, all the printers served by this server were using the server version of the printcap entry. The cause of the problem appeared to lie elsewhere.

In fact, we believe that the cause of the problem was indeed the printcap but that this was masked by a facet of LDP's behavior that we were unaware of, that LPD by default caches printcap entries once it has read them. Once cached, the printcap is only reread if the daemon is restarted or sent a HUP signal. This means that it would be quite possible for the printcap cached by LPD to be different to the printcap found in the /etc/printcap file on the server if the contents of the /etc/printcap file had changed but the LPD daemon had not been restarted. Sure enough, we discovered that the LPRng component only restarted LPD when its configuration files were updated, not when the /etc/printcap file was updated. In the initial printing setup, the configuration files were only updated at 1:00am and 8:00am and these are the only times at which LPD would reread the printcap file. The printcap file was updated hourly and it will be seen that for 22 hours out of the day, the printcap file could be out of sync with LPD's printcap entry. Restarting the LPRng component solved the problem because this restarts LPD and caused it to reread the (now correct) printcap file.

This leaves one question to be answered, why would the printcap file on the servers be incorrect in the first place? We believe that recent events have suggested a reason for this.

There has been a longstanding bug with the network component which caused machines with multiple ethernet interfaces (including the print servers) to occasionally have the wrong IP address assigned to the appropriate interface. We believe that this bug has occasionally been responsible for pcap-query getting the wrong IP address returned for the server it was running on and therefore producing client, rather than server printcap entries. This would only cause printing problems if the faulty printcap was in place when the LPD daemon was restarted at 1:00am or 8:00am.

On the week beginning the 30th of April 2007, something increased the severity of this bug to the point that on many multi-interfaced machines the network component was always assigning incorrect addresses to interfaces. We believe that a new network component schema distributed on the afternoon of the 29th of April may have been responsible for this.

Given this,we can now explain the events of the morning of Tuesday the 1st of May when form 2. of the printing problem appeared at varying times on all the print servers. Since a link between the updating of the configuration and printcap files and the appearance of the printing problem had been suggested, the period at which the configuration and printcap files were updated had been reduced to once a day at 01:00am and 08:00am respectively. On the afternoon of the 29th, the network scheme changed causing the behaviour of the network component bug to change. At 01:00am on the 30th the LPD daemon was restarted but this had no effect on the print service since the printcap file had not changed. At 08:00am on the 30th, the printcap file was recreated with client rather than server printcap entries but since the LPD daemon was not restarted, this also had no immediate effect on the print servers. At 01:00am on the 1st of May, the config files were once more rewritten, the LPD daemon was restarted and the incorrect printcap entries read in by LPD. The appearance of the problem on each of the print servers corresponds exactly to the time the first print job was sent to the server after LPD restarted.

The network component has now been updated to patch the bug which caused

-- CraigStrachan - 07 May 2007

Edit | Attach | Print version | History: r6 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 07 May 2007 - 14:29:00 - CraigStrachan
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies