Infrastructure planning for GPU procurement, 2019

Contents

1. Expected equipment

1.1 Compute nodes

  1. Definitely 10 nodes - and possibly another 3
  2. Each node 4U
  3. Each node carries 8GPUs
  4. Each node equipped with 3 PSUs in 2+1 redundant configuration
  5. Max. power consumption of each node 2.7 kW (=> 11A @250V)

Requirements:

  • Physical space: 40U, up to 52U
  • Power: 27kW (110A @250V), up to 35kW (143A @250V)

1.2 File servers

  1. 4U of servers
  2. Each server will have dual 10Gb/s NICs plus at least one Mgmt NIC
  3. Power consumption ignorable in the context of the compared nodes

Requirements:

  • Physical space: 4U
  • Power: negligible

1.3 Networking

  1. All nodes will have dual 10Gb/s NICs plus at least one Mgmt NIC
  2. For bonding, require 2 switch ports per node => total of 20 ports, up to 26
  3. Ideally require a single 48-way 10Gb/s 1U switch; more likely is a single 48-way 1Gb/s 1U switch for connectivity to the network, and another n(?)-way 10Gb/s 1U switch for fast node interconnects

Requirements:

  • Physical space: 2U
  • Power: negligible

1.4 Total requirements:

  • Physical space: 46U, up to 58U
  • Power: 27kW (110A @250V), up to 35kW (143A @250V)

2. Location

There are two possible locations:

2.1 Appleton Tower basement

  1. Rack 3 (42U) is currently completely free and could be used for most of the equipment. It could house 10 nodes, and two 1U switches.
  2. The max current draw for ten nodes will be 110A.
  3. Rack 3 is currently served by two 32A PDUs, but a third could be arranged by some reorganization (and a 5m 32A command extension cable.) That would give a absolute maximum of 96A - which should suffice in practice.
  4. 4U of file servers could be fitted to the adjacent server rack, Rack 2, which currently has 11U free. We would need to watch the cable routing if there are many interconnections planned.
  5. The three additional nodes could be fitted to the 'Facebook' rack at the extreme end of our bank of racks.
  6. Paul Hutton of IS confirms that this additional total power load would be acceptable in AT. He is checking with Estates regarding cooling.
  7. If we install in AT, then Paul Hutton will require a mechanism for requesting a shutdown in the event of a cooling problem.

2.1.1 Additional items required:

  • One additional rack PDU at ~700
  • One 5m extension 32A commando cable
  • For the 10Gb/s links, sufficient good quality Cat6 patch cables of appropriate length.

2.2 Forum server room IF-B.02

  1. We would need to buy two 42U racks to house the equipment. The obvious physical location for these is to the left of the existing 'GPU' rack.
  2. If we split the possible thirteen nodes 6/7 between the two racks, then the max load per rack will be 66A/77A. Each circuit in B.02 is rated at 20A, so this implies four PDUs per rack - eight in total. In practice, I would suggest three PDUs per rack should suffice.
  3. There are sufficient free circuits in the proposed rack location for either six or eight PDUs - see DB-0.14 power distribution diagram.

2.2.1 Additional items required:

  • Two 42U racks at ~700 each => 1400
  • Six additional rack PDUs at ~700 each => 4200
  • For the 10Gb/s links, sufficient good quality Cat6 patch cables of appropriate length.

Note: Power to actively run all nodes in B.02 will not be available until the Forum server room UPS programme is complete. The current date for that is Wed 24 July 2019.

-- IanDurkacz - 24 May 2019

Topic revision: r4 - 27 May 2019 - 08:13:46 - TimColles
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies