Merged MLP/MSc Teaching Clusters

Overview

This projects main aim is to merge the MSC teaching cluster and the MLP cluster to provide a full production cluster which is suitable for use by all teaching students. This project involved a number of sub projects and will produe a number of deliverables which can be used individually elsewhere (slurm component, distributed filesystem, file distributions system etc. It runs in parallel to the mlp hardware project [link] We will use this tech and some of the infrastructure with the cdt cluster and a possible future research GPU cluster.

The starting point for the project is the mlp clulster at the end of the mlp course last year, it had this configuration:

  • /home filesystem based on the 5 letha nodes using gluster
  • slurm scheduler running on a VM
    • configuration delivered by file component
    • modified FIFO scheduler
    • account admin tasks done by hand
  • 3 head nodes made up of one new server and two hand me downs
  • 25 GPU nodes spread over 3 sites
  • basic monitoring using nagios
  • basic performance monitoring using ganglia

We are aiming to move to a full production system which will involve:

  • LCFG configured slurm
    • accounts created and archived automatically driven by capabilities
    • fair share prioritisation driven by capabilities
    • Low priority use (with preemption) to underused nodes available to users based on priority
  • File distribution system: a low latency method of deploying files to nodes local disk
    • probably using torrent
    • Ideally integrated with the scheduler to deliver fles to scheduled nodes
  • LCFG configured distributed filesystem

-- IainRae - 06 Mar 2019

Topic revision: r2 - 06 Mar 2019 - 11:59:51 - IainRae
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies