RAT Unit Meeting -- 20-01-2020

Projects

CompProj:379 Review of DICE desktop platform

  • gdutton: completion report in progress -- gdutton: link this!
  • gdutton: create new project once report done purpose has changed to wider than original remit

CompProj:392 Live Chat Service

  • discussed with CSO's, many opinions, another non-queued interruption would be a problem
  • looked at some packages
  • can we use teams - this has mobile app, web client is fine on linux, but automation may be needed to make this work in ways like live chat, including returning to RT
  • gdutton: written report on status and options
  • gdutton: revisited some work on Mattermost for other purposes
  • low priority

CompProj:417 Roles Management

  • gdutton: met Toby for remaining deliverables, report written
  • finish project when roles supremo page is sufficient

CompProj:420 Hadoop

  • timc: check if dpia needed, do one if so
  • we need to decide if we can auto-delete and/or retention policy
    • could have two stage process i.e. archive first then delete, with email to say accessible but not run jobs, will be deleted after set period of time otherwise

CompProj:463 Merged MLP/MSC Teaching Clusters

  • script done to generate auto.homes file driven by caps - important so we can separate users onto the correct filesystem, teaching or research
    • add ability to get config from file
    • now shifting people about, along with the amount of data involved means this will take a while to do
  • script managing home directories nor running in reporting mode for safety
  • then finish component, hopefully LCFG mangaeable
  • need to a DPIA

CompProj:464 User access to last/ps/w etc

  • gdutton: concrete proposals report THIS WEEK

CompProj:470 Personalised Portal Page for Academic Staff

  • ongoing, "select" academics testing; need to demo for admin; core development is done; many reports to add or transfer from Portal but that is not part of this project
  • continue on documentation
  • status: very soft launch!

CompProj:472 Procure PGR GPU Cluster

  • iainr: discuss infrastructure with gdmr/idurkacz
    • add to spending plan if appropriate

CompProj:506 Teaching Software 2019/20

  • Node.js added (gdutton to send rwb RT#)
  • S2: nbgrader remains
  • Rust / cargo now available (not deployed)

HTTP -> HTTPS check

  • rwb to investigate, piggy-backing on neilb efforts
  • Nothing major anticipated.

CompProj:539 User Facing MHR Data Asset Register

  • gdutton: liaise with cms, we have an internal system - capture requirement and develop for users
  • middling priority

GPU Approved Supplier Procurement

  • initial discussion with gdutton/iainr complete
  • mini tender in progress for specific order; tight deadline, ostensibly arbitrary limitations
  • some items will have to be purchased as "call-off"
  • ITT due 22nd, delivery scheduled end March.

Misc Development

  • continuing on migrating some TSP processes
    • server-side validation done and working
    • added access control to allow admin staff to view
    • still todo form 2 handling
    • overall reports done
    • gdutton: other TSP enhancement

  • RT lifespan/retention plus other things - discussed at ops
    • DPIA should precede these changes.
    • add accounts by ldap
    • look at merging emails - needs QA process
    • look at purging and retention rules
    • run pre move to postgresql scripts

Operational

  • 7.6 upgrades
    • CDT VM host is last

  • h/w security decommission
    • wasserboxer - move bridgeport off - pending replacement hardware (flapjack), due this week
    • fondant - off, needs to be physically removed
    • arcsim - ssh access only (iptables), also wants a VM server for arcsimvm1 (flapjack)
    • bocian/blanik/karenin
      • all ssh access only while migration completed by Rob, manual iptables configuration
    • henwen + wilbur - due for removal now, rack space is precious

  • GPU networking
    • 10gb cards now pulled
    • 10gb switch now populated with 10gb modules

  • iainr has got power figures from "hannah", max power draw about 8.5A with all GPUs running but if running CPUs power draw goes down
    • data can be collected live from Dells so can get historical maximum, need to check if higher than aircon maximum

  • doing DPIAs:
    • we need to do all our services
      • Webmark
      • Theon
      • TheonPortal
      • ProjSubs, Projects-Archive
      • DPMT
      • Slurm
      • RT4 (can use some of Unidesk replacement one)
      • License server logs
      • Lab exam ?
    • aburford: need to do one for ProctorU
    • timc: doing WhosOff (maybe)
    • privacy statement for:
      • Codegrade
      • Webmark

  • tophat attendance * privacy question, may not be specifically tophat, sitting with IS for comment

  • looking at live capture (transcript) options
    • live broadcast audio much better, latency in transcription may be significant
    • doing more testing with disability office
      • issues with videoconferencing hampering efforts; loan hardware should help
    • likely to be a compromise approach

  • intermittent filesystem weirdness on damnii nodes
    • probably SSD - yes it was
    • SSDs are easy to replace - in the front 1/3 panel

  • lennoxtown still with Novatech, some response since 1st Jan

  • ubatuba also still broken (2 GPU failures)
    • will wait on lennoxtown and then do a GPU swap to test

  • laphroaig still falling over weekly(?) in use
    • also partially knocks over BMC, further investigation required

  • PG v12 - no OIDs anymore, affects TheonUI
    • synthetic oid column is a minimal safe fix

  • Theon (meringue) downtime
    • reboot tickled an invalid configuration, which should've been removed earlier.
    • lots of monitoring due to desktop reboot - also marzipan monitoring wasn't working properly (now fixed)
    • need to deal with Learn embeds - specific 403 for learn subdirectory would be useful
    • would be nice to monitor configuration failures at component level -- gdutton to bugzilla

  • shuffling data from teaching cluster to research cluster
    • iain will do a few and write procedure so richard can then assist

  • pgteach move to teacake / v11
    • underway

  • mattermost trial
    • vs. Slack, vs. MS Teams
    • check if we can use EPCC service on a trial basis
    • Inf VM seems to be next best option if we must host it.
    • SSO/Authentication can be added to Open Source edition.

AOCB

Topic revision: r294 - 20 Jan 2020 - 15:03:46 - Main.GrahamDutton
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies