RAT Unit Meeting -- 11-11-2019

Projects

CompProj:379 Review of DICE desktop platform

  • gdutton: do completion report this week (3rd Time Lucky) so we can end project
  • gdutton: create new project as purpose has changed to wider than original remit
  • high priority

CompProj:392 Live Chat Service

  • discussed with CSO's, many opinions, another non-queued interruption would be a problem
  • looked at some packages
  • can we use teams - this has mobile app, web client is fine on linux, but automation may be needed to make this work in ways like live chat, including returning to RT
  • gdutton: written report on status and options this week
  • low priority

CompProj:405 Map of the Taught Student's Labs

  • timc: do completion report and DPIA
  • timc: add iOS web interface onto main site (or groups.inf) and add link on documentation page

CompProj:417 Roles Management

  • gdutton: helped with Chris's hadoop roles, useful to crystallise procedures, maybe make into documentation
  • gdutton: meeting Toby next week for remaining deliverables
  • finish project when roles supremo page is sufficient

CompProj:420 Hadoop

  • we need to decide if we can auto-delete and/or retention policy
    • could have two stage process i.e. archive first then delete, with email to say accessible but not run jobs, will be deleted after set period of time otherwise

CompProj:455 Mock REF Reviewer System

  • timc: do completion report
  • timc: do DPIA, or decommission (check with Steve/Victoria)

CompProj:463 Merged MLP/MSC Teaching Clusters

  • timc: meet with iainr this week to add/extend deliverables
  • new nodes installed into the cluster, in queues, accessible
  • pgr filesystem servers installed and filesystem created
  • script done to generate auto.homes file driven by caps - important so we can separate users onto the correct filesystem, teaching or research
    • add ability to get config from file
    • now shifting people about, along with the amount of data involved means this will take a while to do
  • fixing script managing home directories as Python upgrade broke it
  • then finish component, hopefully LCFG mangaeable by xmas
  • need to a DPIA

CompProj:464 User access to last/ps/w etc

  • gdutton: concrete proposals report next week

CompProj:465 Teaching Software 2018/19

  • rwb: completion report this week

CompProj:470 Personalised Portal Page for Academic Staff

  • ongoing, "select" academics testing; need to demo for admin; core development is done; many reports to add or transfer from Portal but that is not part of this project
  • continue on documentation
  • continuing on migrating some TSP processes to test callback functionality - really just validation now, then form 2 handling

CompProj:472 Procure PGR GPU Cluster

  • done bar 10gb infrastructure
    • need to pull 10gb cards used during initial testing
    • 10gb switch now populated with 10gb modules
  • timc: start completion process
  • timc: discuss infrastructure with iainr/idurkacz

CompProj:506 Teaching Software 2019/20

  • S1 done
  • late requests should not receive priority over research cluster work
  • S2: kaldi - will add 10gb to root partition, preferably using intel math libraries which are faster but add another 11gb, fortunately graham has just made 20gb savings
    • maths libraries going on this week
    • been rebuilt ready for users to test
    • include on GPU nodes
  • S2: kaldi custom version was required for python

HTTP -> HTTPS check

  • rwb to investigate

User Facing MHR Data Asset Register

  • gdutton: liaise with cms, we have an internal system - capture requirement and develop for users

GPU Approved Supplier Procurement

  • timc: speak to iainr

Misc Development

  • trac rattrap for ISS users, will use pgluser to manage
    • gdutton: done
    • gdutton: letsencrypt and pgluser done
    • timc: will add user tickets in the interim
    • timc: add crontab to pull git repositories
    • gdutton: check email still works

Operational

  • RT - autoresolver added, ready to enable pending review of email to be sent out
    • done, not many have come back

  • 7.6 upgrades - scheduling nodes - python change breaks scripts and they need rewritten so hold off on these
    • infdb has been upgraded to 7.6 and postgresql 11.5; marzipan done; mochi done
    • iainr: issrt done
    • pgteach (is 7.6): needs to move to teacake and postgres 11.5
    • CDT scheduler still to do as will break scripts, plus glorious KVM server (should be no problem)
      • maybe look at live migration to avoid cluster downtime
    • RT lifespan/retention plus other things - add to discussion topic for next ops

  • h/w security decommission
    • wasserboxer
    • fondant
    • arcsim - in contact with Bjoern
    • redsea - can go now
    • bocian/blanik - in contact with Douglas
      • gdutton: have an SL7 server running with everything required, Rob now needs to migrate
    • henwen + wilbur - in contact with Amos/Charles

  • iainr/rwb/gdutton: crypt server
    • up for 3 days ... ! using USB network dongle with onboard network kernel module pulled
    • Daryl suggested shipping back (is that on-site?), would have to pull disks as MHR data on them

  • iainr: 1 gpu server still to install - power issues affecting flexibility of physical installation
    • only available rack location wont work as-is, infrastructure fixing
    • "youyou" has been replaced by "marax"
  • rwb/iainr: moved "hannah" into PGR cluster with Slurm queue for private user use
    • rwb documented
    • timc: contact user
  • iainr has got power figures from "hannah", max power draw about 8.5A with all GPUs running but if running CPUs power draw goes down
    • completely switched off draws 0.7A (3x0.25A) - i.e. just BMC
    • cant measure just one T630 as multiple configurations
    • would be worth checking to see if IPMI power stats align
    • need to assess all racks to see how close we are to limits
    • iainr: we will do Novatech ones and Tyan one, infrastructure could do the rest

  • doing DPIAs:
    • we need to do all our services
    • aburford: need to do one for ProctorU
    • timc: doing WhosOff

  • tophat attendance * privacy question, may not be specifically tophat, sitting with IS for comment

  • looking at live capture options
    • live broadcast audio much better, latency in transcription may be significant

  • course questionnaire mid semester feedback

  • gdutton: suggest revised procedures for running online lab exams

  • exam prep machines
    • problems with "technical debt" and loss of knowledge
    • wrong online procedural docs need to be removed and replaced
    • alisond has been doing training

AOCB

Topic revision: r291 - 11 Nov 2019 - 15:09:41 - Main.TimColles
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies