Merged MLP/MSc Teaching Clusters
Overview
This projects main aim is to merge the MSC teaching cluster and the MLP cluster to provide a full production cluster which is suitable for use by all teaching students. This project involved a number of sub projects and will produe a number of deliverables which can be used individually elsewhere (slurm component, distributed filesystem, file distributions system etc. It runs in parallel to the mlp hardware project [link] We will use this tech and some of the infrastructure with the cdt cluster and a possible future research GPU cluster.
The starting point for the project is the mlp clulster at the end of the mlp course last year, it had this configuration:
- /home filesystem based on the 5 letha nodes using gluster
- slurm scheduler running on a VM
- configuration delivered by file component
- modified FIFO scheduler
- account admin tasks done by hand
- 3 head nodes made up of one new server and two hand me downs
- 25 GPU nodes spread over 3 sites
- basic monitoring using nagios
- basic performance monitoring using ganglia
We are aiming to move to a full production system which will involve:
- LCFG configured slurm
- accounts created and archived automatically driven by capabilities
- fair share prioritisation driven by capabilities
- Low priority use (with preemption) to underused nodes available to users based on priority
- File distribution system: a low latency method of deploying files to nodes local disk
- probably using torrent
- Ideally integrated with the scheduler to deliver fles to scheduled nodes
- LCFG configured distributed filesystem
--
IainRae - 06 Mar 2019
Topic revision: r2 - 06 Mar 2019 - 11:59:51 -
IainRae