CDT Cluster Softwre

This is the final report for DevProj:332

Description

This project delivered the full required software requirements for the cdt cluster.

Operating system.

From the outset the users wanted to keep their options open and there was the possibility that students might be given bare metal access to the nodes. the preferred options being in order:
  • DICE
  • DICE with other os on virtualbox
  • DICE other os on KVM
  • Bare metal
As the project started in line with the development of DICE SL7 on desktops we gave the cluster purchasers the option of DICE 6 vs DICE 7, It was envisioned that 7 would involve more work from the point of view of installing the OS but that this would be balanced by a reduction in time spend building userspace software as things would be more up to date and also because much of the software would have to be rebuilt for SL7 anyway. There was also a desire from the users to get access to the latest versions of some software.

All the nodes were initially installed with the "desktop" installation of SL7, They were then re-installed with the final server version after it became available and as the nodes were free for re-install.

scheduler software

Initially the users were unsure if they wanted to run a scheduler, about 6 months in it became apparent that the data-science people would benefit from running a scheduler on the front of their GPU nodes. From the usual suspects gridengine was chosen in order to match the environment at ECDF and because we had prior knowledge of it. Since Sun's purchase by Oracle gridengine has gone back to being a closed source application and ownership has passed to Univa, Available open source versions are all forks of the last Sun release and much of the supporting documentation has gone. Son of grid engine was chosen as this was the version running on ecdf at the time, this has since changed but we have stuck with our original choice

Shared filesystems

A prerequisite for gridengine is a shared filesystem for the gridengine spooling directories and also a non kerberised filessytem. The original intention had been to use the Schools GPFS license to provide the filesystem, either using the existing nodes or purchasing storage nodes if required. Unfortunately the version of GPFS which we had licensed was not upgradable to work with the SL7 kernel and purchasing new licences was likely to be too expensive

In the short term we used one of the james nodes to provide an NFS fileserver for the gridengine spool filesystem and for home directories whist we looked at a replacement distributed filessytem. Given that a high performance filestore was deemed not to be a requirement for the original hardware purchase the criterion for the filesystem were:

  • Well supported under Redhat
  • mature
  • no passwordless root login requiremet
  • Expandible without requiring additional hardware purchase.
  • full shell access
  • Speed

We did a paper review of a number of filesystems including pNFS, gluster, ceph, lustre, hdfs, GFS and modern GPFS (as a baseline, we expected it to be prohibitively expensive) and chose Hadoop There was an iniitial requrest to install hadoop plus some software developed on top of hadoop however this was never used and in the end we removed it to allow more user diskspace, The development work was useful in setting up the newer hadoop teaching cluster so little time was actually wasted.

User software

A diverse set of user software was requested much of it python based and most provided few problems

Target audience

The survey was targeted at all staff and research students in order to cover people in the school who:
  • Specify services for teaching and reserach.
  • Are most likely to handle sensitive data
  • Are most likely to be responsible for I.P.

In general we got a reasonable response although the return from admin staff was dissapointing

Time taken

Amount: 5 weeks Some of this included learning to use webmark (probably 3-4 days) and there was a further 3-4 days spend playing with various bits of software in order to decide how to generate the graphs for the report.

Conclusions

Webmark is fairly good at generating this kind of survey, the only real let down was that it wasn't possible to generate graphs straight from the returns but this would have been beyond the scope of the software as originally specced so it's an unreasonable request.

oocalc is useful for quickly throwing together results from csv files but rapidly becomes unmanagable as the complexity of the data and the questions you would like to ask of it increases. generating results of conditional queries involving multiple responses (i.e. queries along the lines of "Of the people who are currently using VMs how many are interested in a KVM service and how many of them would be willing to pay") rapidly becomes unmanagable and fraught with the possibility of error. In retrospect dumping the results into a database and using python or R would have been a better approach, albeit with a higher learning curve.

We should have had some of the anaysis software in place during the trial because we missed out on some information relating to multiple questions because people responded in ways we didn't quite expect.

More by luck than judgement we seemed to hit a sweet spot at the time we sent the survey out (8.30), most of the responses came back fairly quickly and we seem to have caught people when they were responsive before they'd settled down to work for the day.

-- IainRae - 04 Aug 2014

-- IainRae - 11 Oct 2017
Edit | Attach | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 24 Oct 2017 - 15:22:40 - IainRae
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies