RS-232 serial console provision - 2016 review

Contents

1. Introduction

We currently provide Lantronix SLC boxes in all of our server rooms, in order to allow the possibility of remote serial consoles handled via direct RS-232 serial links. Each such box can handle up to 32 serial consoles, and all such consoles are handled via our conserver infrastructure.

The SLC boxes we have are getting long in the tooth, and we can expect them to progressively fail. In addition: the particular SLC boxes we use are no longer produced by the manufacturer. The original purchase price of each SLC was about 2K-3K; the cost of any similar modern replacement unit can be expected to be similar.

We would expect that almost all modern servers purchased by us would support serial consoles via IPMI 'serial-over-lan' (a.k.a. 'SOL') - i.e. no modern server should need RS-232 serial console provision.

The purpose of this note is to review current provision and usage, and to suggest a plan for the next few years.

2. Observations

  1. Current usage of remote serial consoles implemented via RS-232 is summarized in Appendix A. The tables therein were arrived at after some cleaning up of the current conserver setup in order to remove consoles left in place despite the servers in question having been removed.
    Note:
    1. It is clear that some of our modern(-ish) servers have, in the past. been cabled and configured both for IPMI SOL consoles, as well as RS-232 consoles. It is not clear why that has been done: it should never be necessary, and it should certainly never be standard practice.
    2. It is not clear why certain modern servers listed in Appendix A are indeed using RS-232 serial consoles, rather than IPMI SOL consoles. There might be good reasons - perhaps support for IPMI SOL was deficient on some or all of the models of server involved? - but it might also be that certain of those machines can and should now be reconfigured to use SOL consoles.
  2. Appendix A makes it obvious that we are now over-provisioned in SLC port capacity - but, at the same time, we have no spare SLC units. The over-provisioning implies that there is no justification in buying additional SLCs as spares; I suggest that the obvious thing to do is to gracefully retire some of our existing SLC boxes - in a way that causes us minimal disruption -, in order to give us a pool of spares which can be used in the future to deal with hardware failures.
  3. Experience shows that remote serial consoles implemented via IPMI SOL are still not as reliable as those implemented via direct RS-232 links: BMCs can crash or hang, leading to SOL serial consoles becoming unresponsive. Neveretheless, we expect to further standardise on the use of IPMI SOL, not least because that technology requires both fewer resources, and less expenditure. We would expect BMC implementation to further improve over time; and, in order to preempt problems, we need to implement any available BMC firmware updates in as timely a manner as possible.
  4. Regarding the provision in the self-managed server room:
    1. The logs show that no regular use is being made of the remote serial console service we provide: owners obviously use a physical screen/keyboard when working on machines in the room.
    2. The configuration of the service is rotting: users no longer officially present in the School are still listed as serial console 'owners.' Such rotting presumably goes unnoticed precisely because the service is never used.
    3. We provide a dedicated BMC/SOL subnet which can be used by the machine owners to provide an equivalent remote console service via IPMI for any modern machine. (And, we want to discourage the use of old machines in the room.)

3. Proposals

  1. Remove four Lantronix SLCs - srslc01, srslc03, srcl05, and srcls07 - from the Forum server room; reset all to factory default; and keep in storage as general spares. Relocate existing RS-232 serial connections from srslc01 to srslc00; srslc03 to srslc02; etc. (Relocating existing connections from box to box will admittedly result in a certain amount of cable messiness, but that seems an acceptable price to pay under the overall circumstances. In any case, we will have maintained our overall rack layout in the Forum server room, and the cabling within them, as 'sets of three.')
  2. Withdraw the conserver-managed RS-232 serial console service in the Forum self-managed server room. Remove the single Lantronix SLC - smlc00 - from that room; reset it to factory default; and keep in storage as a general spare.
  3. Leave the current Lantronix SLC provision in AT and KB as-is. If a SLC subsequently fails in AT, either consolidate all RS-232 serial connections on the remaining good unit, or replace the failed unit from spares.

Appendix A. Current provision

A.1. Forum server room

A.1.1 Physical layout

     +----------------------------- A I R C O N   U N I T S  ------------------------------+
     |   'Self-managed' racks                         Server racks                         |
     |  +-----+-----+---------+   +-----+-----+-----+-----+-----+-----+-----+-----+-----+  |
     |  | R16 | R15 | Shelves |   | R14 | R13 | R12 | R11 | R10 | R09 | R08 | R07 | R06 |  |
     |  +-----+-----+---------+   +-----+-----+-----+-----+-----+-----+-----+-----+-----+  |
Door ||                                                                                    |
     ||                                               'Fibrechannel' racks                 |
     +------------------+                  +--------+-----+-----+-----+-----+-----+-----+  |
                        |                  |  Desk  | R05 | R04 | R03 | R02 | R01 | R00 |  |
                        |                  +--------+-----+-----+-----+-----+-----+-----+  |
                        .                                                                  .

A.1.2 Lantronix SLC usage

srslc00 (in Rack 0) srslc01 (in Rack 2) srslc02 (in Rack 3) srslc03 (in Rack 5)
Machine name Model
enceladus PE1950
ifev01 Disc array
jupiter1 PE1950
jupiter2 PE1950
jupiter3 PE1950
Machine name Model
broom PE2950
orator PE850
Machine name Model
linnaeus R200
satabeast2 Disc array
Machine name Model
core0 HP switch
core1 HP switch
core2 HP switch

srslc04 (in Rack 6) srslc05 (in Rack 8) srslc06 (in Rack 9) srslc07 (in Rack 11)
Machine name Model
cup01 R410
dalfaber HP DL120
staffa R710
venus PE1850
wafer R610
Machine name Model
brendel R200
cup02 R410
fenrir R200
hp1 HP DL120
hp2 HP DL120
hp3 HP DL120
mckinley R610
mercury PE1850
scargill PE1850
victor R610
Machine name Model
arcsim R410
blanik R510
bocian R510
bonnybridge Viglen GPU
catzilla R715
schaffner Viglen GPU
Machine name Model
adamski Viglen GPU
dechmont Viglen GPU
hcrc1425n04 SC1425
hcrc1425n06 SC1425
hcrc1425n09 SC1425
hcrc1425n10 SC1425
hcrc1425n25 SC1425
hcrc1425n28 SC1425
lazar Viglen GPU
hynek Viglen GPU
mayer Viglen GPU
pasta PE1950
puma PE1950

srslc08 (in Rack 15)
Machine name Model
haggis Desktop
melmac Desktop
neep Desktop
porthemmet Desktop
rendlesham Desktop
tatties Desktop

A.2. Forum self-managed server room

A.2.1 Physical layout

A single Lantronix SLC - smslc00 - located in one of the central racks.

A.2.2 Lantronix SLC usage

Machine name
hypnos
ir
mir
nrg
nyx
synprot
sperrin
supersonic

A.3. AT basement server room

A.3.1 Physical layout

+-------+-------+-------+-------+-----+-----+-----+-----+-----+
| Rack0 | Rack1 | Rack2 | Rack3 |MSc0 |MSc1 |CDT0 |CDT1 |CDT2 |
+-------+-------+-------+-------+-----+-----+-----+-----+-----+

+-------------+
| Informatics |
| comms area  |
+-------------+

A.3.2 Lantronix SLC usage

atslc00 (in Rack 0) atslc01 (in Rack 2)
Machine name Model
atc0 HP switch
atc1 HP switch
burly R610
cigar PE2950
circle R710
cup03 R410
darwin R200
Machine name Model
atabeast1 Disc array
blackwell R610
satablade1 Disc array
schiff R200
skoll R210
stoater PE2850

A.4 KB server room

A.4.1 Physical layout

A single Lantronix SLC - kbslc00 - located in one of the racks.

A.4.2 Lantronix SLC usage

Note: The following connections have not been physically checked in the course of this exercise.

Machine name Model
ataboy1 Disc array
cake PE2950
hati R210
kbevo21 Disc array
satabeast1 Disc array
sataboy1 Disc array

-- IanDurkacz - 01 Aug 2016

Topic revision: r8 - 05 Sep 2016 - 13:50:24 - RossArmstrong
DICE.RS232SerialConsoleProvision2016Review moved from DICE.RS232SerialConsolesProvision2016Review on 01 Aug 2016 - 13:34 by IanDurkacz - put it back
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies