RAT Unit Meeting -- 18-11-2019

Projects

CompProj:379 Review of DICE desktop platform

  • gdutton: do completion report in BPW so we can end project
  • gdutton: create new project in BPW as purpose has changed to wider than original remit
  • high priority

CompProj:392 Live Chat Service

  • discussed with CSO's, many opinions, another non-queued interruption would be a problem
  • looked at some packages
  • can we use teams - this has mobile app, web client is fine on linux, but automation may be needed to make this work in ways like live chat, including returning to RT
  • gdutton: written report on status and options in BPW
  • low priority

CompProj:405 Map of the Taught Student's Labs

  • timc: do completion report and DPIA in BPW
  • timc: add iOS web interface onto main site (or groups.inf) and add link on documentation page in BPW

CompProj:417 Roles Management

  • gdutton: helped with Chris's hadoop roles, useful to crystallise procedures, maybe make into documentation
  • gdutton: meeting Toby next week for remaining deliverables
  • finish project when roles supremo page is sufficient
  • spoke to Alison about Learners role and network capabilities

CompProj:420 Hadoop

  • we need to decide if we can auto-delete and/or retention policy
    • could have two stage process i.e. archive first then delete, with email to say accessible but not run jobs, will be deleted after set period of time otherwise

CompProj:455 Mock REF Reviewer System

  • timc: do completion report in BPW
  • timc: do DPIA, or decommission (check with Steve/Victoria)

CompProj:463 Merged MLP/MSC Teaching Clusters

  • timc: met with iainr and added/extended/closed deliverables
  • script done to generate auto.homes file driven by caps - important so we can separate users onto the correct filesystem, teaching or research
    • add ability to get config from file
    • now shifting people about, along with the amount of data involved means this will take a while to do
  • fixing script managing home directories as Python upgrade broke it
  • then finish component, hopefully LCFG mangaeable by xmas
  • need to a DPIA

CompProj:464 User access to last/ps/w etc

  • gdutton: concrete proposals report next week in BPW

CompProj:465 Teaching Software 2018/19

  • rwb: done, timc: tick of deliverables etc

CompProj:470 Personalised Portal Page for Academic Staff

  • ongoing, "select" academics testing; need to demo for admin; core development is done; many reports to add or transfer from Portal but that is not part of this project
  • continue on documentation
  • continuing on migrating some TSP processes
    • server-side validation done and working
    • added access control to allow admin staff to view
    • still todo form 2 handling
    • still todo overall reports

CompProj:472 Procure PGR GPU Cluster

  • done bar 10gb infrastructure
    • 10gb cards now pulled
    • 10gb switch now populated with 10gb modules
  • timc: start completion process in BPW
  • iainr: discuss infrastructure with gdmr/idurkacz

CompProj:506 Teaching Software 2019/20

  • S2: kaldi - will add 10gb to root partition, preferably using intel math libraries which are faster but add another 11gb, fortunately graham has just made 20gb savings
    • maths libraries going on this week
    • been rebuilt ready for users to test
    • include on GPU nodes
  • S2: kaldi custom version was required for python

HTTP -> HTTPS check

  • rwb to investigate

User Facing MHR Data Asset Register

  • gdutton: liaise with cms, we have an internal system - capture requirement and develop for users

GPU Approved Supplier Procurement

  • timc: speak to iainr

Misc Development

  • trac rattrap for ISS users, will use pgluser to manage
    • timc: added user tickets
    • gdutton: added crontab to pull git repositories
    • gdutton: email works now

Operational

  • lab programming exams
    • two mock exams running for INF1A, still without papers and still nowhere meeting any of the requirements of the exam checklist
      • one this Wednesday, one on Friday, many next week and TSPL on 29th
      • need to remind everyone of need to meet checklist in advance * DoT to be informed if exam proceeds without checklist done, check with Alison current status
      • strike action next week will have potential affect
      • 5th Dec (Thu) is a real exam and 10th Dec (Tue), Graham on annual leave on 5th, Richard here on 10th

  • 7.6 upgrades - scheduling nodes - python change breaks scripts and they need rewritten so hold off on these
    • pgteach (is 7.6): teacake now 7.6 and postgres 11.5, just need to schedule move
    • CDT scheduler still to do as will break scripts, plus glorious KVM server (should be no problem)
      • maybe look at live migration to avoid cluster downtime
    • RT lifespan/retention plus other things - discussed at ops
      • add accounts by ldap
      • look at merging emails
      • look at purging and retention rules
      • run pre move to postgresql scripts

  • h/w security decommission
    • wasserboxer - move bridgeport off and just switch off
    • fondant - just switch off
    • arcsim - in contact with Bjoern - timc to chase
    • redsea - some wish to keep for RAT test cluster usage as VM host plus GPU card host, timc to check with Alastair
    • bocian/blanik - in contact with Douglas
      • gdutton: have an SL7 server running with everything required, Rob now needs to migrate
    • henwen + wilbur - in contact with Amos/Charles, timc to chase

  • iainr/rwb/gdutton: crypt server
    • up for 10 days ... iainr declaring this fixed
      • iainr going to bring down to change name back and disable onboard network
    • Daryl suggested shipping back (is that on-site?), would have to pull disks as MHR data on them

  • iainr: 1 gpu server still to install - power issues affecting flexibility of physical installation
    • only available rack location wont work as-is, infrastructure fixing

  • rwb/iainr: moved "hannah" into PGR cluster with Slurm queue for private user use
    • timc: contacted user, no response

  • iainr has got power figures from "hannah", max power draw about 8.5A with all GPUs running but if running CPUs power draw goes down
    • completely switched off draws 0.7A (3x0.25A) - i.e. just BMC
    • cant measure just one T630 as multiple configurations
    • would be worth checking to see if IPMI power stats align
    • need to assess all racks to see how close we are to limits
    • iainr: we will do Novatech ones and Tyan one, infrastructure could do the rest
      • more figures from Novatech server, even worse
      • trying to get figures off Tyan before handover
      • ongoing discussion with infrastructure over actual requirements

  • doing DPIAs:
    • we need to do all our services
    • aburford: need to do one for ProctorU
    • timc: doing WhosOff

  • tophat attendance * privacy question, may not be specifically tophat, sitting with IS for comment

  • looking at live capture options
    • live broadcast audio much better, latency in transcription may be significant

  • course questionnaire mid semester feedback

  • gdutton: suggest revised procedures for running lab exams

  • exam prep machines
    • wrong online procedural docs need to be removed and replaced - USU needs to do

  • intermittent filesystem weirdness on damnii nodes
    • tests suggest not SSD at fault, despite seeming to be an NVME issue
    • seen on a few nodes, maybe kenrel driver?
    • switched damnii02 to XFS to isolate whether it is EXT4 driver issue
    • also upgrading BIOS to see if fix console problem

  • Python3 conversion - we have a bit to do

  • PG v12 - no OIDs anymore, affects TheonUI

AOCB

Edit | Attach | Print version | History: r296 | r294 < r293 < r292 < r291 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r292 - 18 Nov 2019 - 15:10:03 - Main.TimColles
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies