RS-232 serial console provision - 2016 review
Contents
1. Introduction
We currently provide Lantronix SLC boxes in all of our server rooms, in order to allow the possibility of remote serial consoles handled via direct RS-232 serial links. Each such box can handle up to 32 serial consoles, and all such consoles are handled via our conserver infrastructure.
The SLC boxes we have are getting long in the tooth, and we can expect them to progressively fail. In addition: the particular SLC boxes we use are no longer produced by the manufacturer. The original purchase price of each SLC was about £2K-£3K; the cost of any similar modern replacement unit can be expected to be similar.
We would expect that almost all modern servers purchased by us would support serial consoles via IPMI 'serial-over-lan' (a.k.a. 'SOL') - i.e. no modern server should
need RS-232 serial console provision.
The purpose of this note is to review current provision and usage, and to suggest a plan for the next few years.
2. Observations
- Current usage of remote serial consoles implemented via RS-232 is summarized in Appendix A. The tables therein were arrived at after some cleaning up of the current conserver setup in order to remove consoles left in place despite the servers in question having been removed.
Note:
- It is clear that some of our modern(-ish) servers have, in the past. been cabled and configured both for IPMI SOL consoles, as well as RS-232 consoles. It is not clear why that has been done: it should never be necessary, and it should certainly never be standard practice.
- It is not clear why certain modern servers listed in Appendix A are indeed using RS-232 serial consoles, rather than IPMI SOL consoles. There might be good reasons - perhaps support for IPMI SOL was deficient on some or all of the models of server involved? - but it might also be that certain of those machines can and should now be reconfigured to use SOL consoles.
- Appendix A makes it obvious that we are now over-provisioned in SLC port capacity - but, at the same time, we have no spare SLC units. The over-provisioning implies that there is no justification in buying additional SLCs as spares; I suggest that the obvious thing to do is to gracefully retire some of our existing SLC boxes - in a way that causes us minimal disruption -, in order to give us a pool of spares which can be used in the future to deal with hardware failures.
- Experience shows that remote serial consoles implemented via IPMI SOL are still not as reliable as those implemented via direct RS-232 links: BMCs can crash or hang, leading to SOL serial consoles becoming unresponsive. Neveretheless, we expect to further standardise on the use of IPMI SOL, not least because that technology requires both fewer resources, and less expenditure. We would expect BMC implementation to further improve over time; and, in order to preempt problems, we need to implement any available BMC firmware updates in as timely a manner as possible.
- Regarding the provision in the self-managed server room:
- The logs show that no regular use is being made of the remote serial console service we provide: owners obviously use a physical screen/keyboard when working on machines in the room.
- The configuration of the service is rotting: users no longer officially present in the School are still listed as serial console 'owners.' Such rotting presumably goes unnoticed precisely because the service is never used.
- We provide a dedicated BMC/SOL subnet which can be used by the machine owners to provide an equivalent remote console service via IPMI for any modern machine. (And, we want to discourage the use of old machines in the room.)
3. Proposals
- Remove four Lantronix SLCs -
srslc01
, srslc03
, srcl05
, and srcls07
- from the Forum server room; reset all to factory default; and keep in storage as general spares. Relocate existing RS-232 serial connections from srslc01
to srslc00
; srslc03
to srslc02
; etc. (Relocating existing connections from box to box will admittedly result in a certain amount of cable messiness, but that seems an acceptable price to pay under the overall circumstances. In any case, we will have maintained our overall rack layout in the Forum server room, and the cabling within them, as 'sets of three.')
- Withdraw the conserver-managed RS-232 serial console service in the Forum self-managed server room. Remove the single Lantronix SLC -
smlc00
- from that room; reset it to factory default; and keep in storage as a general spare.
- Leave the current Lantronix SLC provision in AT and KB as-is. If a SLC subsequently fails in AT, either consolidate all RS-232 serial connections on the remaining good unit, or replace the failed unit from spares.
Appendix A. Current provision
A.1. Forum server room
A.1.1 Physical layout
+----------------------------- A I R C O N U N I T S ------------------------------+
| 'Self-managed' racks Server racks |
| +-----+-----+---------+ +-----+-----+-----+-----+-----+-----+-----+-----+-----+ |
| | R16 | R15 | Shelves | | R14 | R13 | R12 | R11 | R10 | R09 | R08 | R07 | R06 | |
| +-----+-----+---------+ +-----+-----+-----+-----+-----+-----+-----+-----+-----+ |
Door || |
|| 'Fibrechannel' racks |
+------------------+ +--------+-----+-----+-----+-----+-----+-----+ |
| | Desk | R05 | R04 | R03 | R02 | R01 | R00 | |
| +--------+-----+-----+-----+-----+-----+-----+ |
. .
A.1.2 Lantronix SLC usage
srslc00 (in Rack 0) |
srslc01 (in Rack 2) |
srslc02 (in Rack 3) |
srslc03 (in Rack 5) |
Machine name |
Model |
enceladus |
PE1950 |
ifev01 |
Disc array |
jupiter1 |
PE1950 |
jupiter2 |
PE1950 |
jupiter3 |
PE1950 |
|
|
|
|
srslc04 (in Rack 6) |
srslc05 (in Rack 8) |
srslc06 (in Rack 9) |
srslc07 (in Rack 11) |
|
Machine name |
Model |
brendel |
R200 |
cup02 |
R410 |
fenrir |
R200 |
hp1 |
HP DL120 |
hp2 |
HP DL120 |
hp3 |
HP DL120 |
mckinley |
R610 |
mercury |
PE1850 |
scargill |
PE1850 |
victor |
R610 |
|
Machine name |
Model |
arcsim |
R410 |
blanik |
R510 |
bocian |
R510 |
bonnybridge |
Viglen GPU |
catzilla |
R715 |
schaffner |
Viglen GPU |
|
Machine name |
Model |
adamski |
Viglen GPU |
dechmont |
Viglen GPU |
hcrc1425n04 |
SC1425 |
hcrc1425n06 |
SC1425 |
hcrc1425n09 |
SC1425 |
hcrc1425n10 |
SC1425 |
hcrc1425n25 |
SC1425 |
hcrc1425n28 |
SC1425 |
lazar |
Viglen GPU |
hynek |
Viglen GPU |
mayer |
Viglen GPU |
pasta |
PE1950 |
puma |
PE1950 |
|
srslc08 (in Rack 15) |
Machine name |
Model |
haggis |
Desktop |
melmac |
Desktop |
neep |
Desktop |
porthemmet |
Desktop |
rendlesham |
Desktop |
tatties |
Desktop |
|
A.2. Forum self-managed server room
A.2.1 Physical layout
A single Lantronix SLC -
smslc00
- located in one of the central racks.
A.2.2 Lantronix SLC usage
Machine name |
hypnos |
ir |
mir |
nrg |
nyx |
synprot |
sperrin |
supersonic |
A.3. AT basement server room
A.3.1 Physical layout
+-------+-------+-------+-------+-----+-----+-----+-----+-----+
| Rack0 | Rack1 | Rack2 | Rack3 |MSc0 |MSc1 |CDT0 |CDT1 |CDT2 |
+-------+-------+-------+-------+-----+-----+-----+-----+-----+
+-------------+
| Informatics |
| comms area |
+-------------+
A.3.2 Lantronix SLC usage
atslc00 (in Rack 0) |
atslc01 (in Rack 2) |
Machine name |
Model |
atc0 |
HP switch |
atc1 |
HP switch |
burly |
R610 |
cigar |
PE2950 |
circle |
R710 |
cup03 |
R410 |
darwin |
R200 |
|
Machine name |
Model |
atabeast1 |
Disc array |
blackwell |
R610 |
satablade1 |
Disc array |
schiff |
R200 |
skoll |
R210 |
stoater |
PE2850 |
|
A.4 KB server room
A.4.1 Physical layout
A single Lantronix SLC -
kbslc00
- located in one of the racks.
A.4.2 Lantronix SLC usage
Note: The following connections have not been physically checked in the course of this exercise.
Machine name |
Model |
ataboy1 |
Disc array |
cake |
PE2950 |
hati |
R210 |
kbevo21 |
Disc array |
satabeast1 |
Disc array |
sataboy1 |
Disc array |
--
IanDurkacz - 01 Aug 2016
DICE.RS232SerialConsoleProvision2016Review moved from DICE.RS232SerialConsolesProvision2016Review on 01 Aug 2016 - 13:34 by IanDurkacz -
put it back