CDT Cluster Software

This is the final report for CompProj:332

Description

This project delivered the full required software requirements for the cdt cluster.

Operating system.

From the outset the users wanted to keep their options open and there was the possibility that students might be given bare metal access to the nodes. the preferred options being in order:
  • DICE
  • DICE with other os on virtualbox
  • DICE other os on KVM
  • Bare metal
As the project started in line with the development of DICE SL7 on desktops we gave the cluster purchasers the option of DICE 6 vs DICE 7, It was envisioned that 7 would involve more work from the point of view of installing the OS but that this would be balanced by a reduction in time spend building userspace software as things would be more up to date and also because much of the software would have to be rebuilt for SL7 anyway. There was also a desire from the users to get access to the latest versions of some software.

All the nodes were initially installed with the "desktop" installation of SL7, They were then re-installed with the final server version after it became available and as the nodes were free for re-install.

scheduler software

Initially the users were unsure if they wanted to run a scheduler, about 6 months in it became apparent that the data-science people would benefit from running a scheduler on the front of their GPU nodes. From the usual suspects gridengine was chosen in order to match the environment at ECDF and because we had prior knowledge of it. Since Sun's purchase by Oracle gridengine has gone back to being a closed source application and ownership has passed to Univa, Available open source versions are all forks of the last Sun release and much of the supporting documentation has gone. Son of grid engine was chosen as this was the version running on ecdf at the time, this has since changed but we have stuck with our original choice.

Shared filesystems

A prerequisite for gridengine is a shared filesystem for the gridengine spooling directories and also a non kerberised filessytem. The original intention had been to use the Schools GPFS license to provide the filesystem, either using the existing nodes or purchasing storage nodes if required. Unfortunately the version of GPFS which we had licensed was not upgradable to work with the SL7 kernel and purchasing new licences was likely to be too expensive

In the short term we used one of the james nodes to provide an NFS fileserver for the gridengine spool filesystem and for home directories whist we looked at a replacement distributed filessytem. Given that a high performance filestore was deemed not to be a requirement for the original hardware purchase the criterion for the filesystem were:

  • Well supported under Redhat
  • mature
  • no passwordless root login requiremet
  • Expandible without requiring additional hardware purchase.
  • full shell access
  • Speed

We did a paper review of a number of filesystems including pNFS, gluster, ceph, lustre, hdfs, GFS and modern GPFS (as a baseline, we expected it to be prohibitively expensive) and chose gluster because it met most of the criteria and was shipped as part of redhat storage server.

Hadoop

There was an iniitial requrest to install hadoop plus some software developed on top of hadoop however this was never used and in the end we removed it to allow more user diskspace, The development work was useful in setting up the newer hadoop teaching cluster so little time was actually wasted.

User software

A diverse set of user software was requested much of it python based and most provided few problems .

Time taken

Approximately 17 weeks.

Conclusions

In reality this is a cluster of more normal sized projects (os, filesystem, scheduler, software, hadoop) and it would have been better to have handled this with a small team (say three people).

We should be wary of giving up expertise and if we do return to a technology we should not assume that we can just pick up where we left off, in retrospect gridengine was probably a poor choice for scheduling GPUs as whilst it can manage the allocation of jobs to nodes it cannot manage the GPUs on the nodes in the same way that it can manage the CPU resources on the nodes and additional coding was needed to support this. Also it is no longer really a well supported solution.

Not really news but projects where the customer has a fairly wooly requirement tend to eat up lots of time.

-- IainRae - 11 Oct 2017

Topic revision: r7 - 19 Dec 2017 - 09:57:49 - TimColles
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies