RAT Unit Meeting -- 27-01-2020

Projects

CompProj:379 Review of DICE desktop platform

  • gdutton: completion report in progress
  • gdutton: create new project once report done: purpose has changed to wider than original remit

CompProj:392 Live Chat Service

  • discussed with CSO's, many opinions, another non-queued interruption would be a problem
  • looked at some packages
  • can we use teams - this has mobile app, web client is fine on linux, but automation may be needed to make this work in ways like live chat, including returning to RT
  • gdutton: interim report on status and options pending
  • gdutton: revisited some work on Mattermost for other purposes
  • low priority

CompProj:417 Roles Management

  • gdutton: met Toby for remaining deliverables, report written. Notes at gdutton:EntitlementTools
  • finish project when roles supremo page is sufficient

CompProj:420 Hadoop

  • timc: check if dpia needed, yes
  • we need to decide if we can auto-delete and/or retention policy
    • could have two stage process i.e. archive first then delete, with email to say accessible but not run jobs, will be deleted after set period of time otherwise

CompProj:463 Merged MLP/MSC Teaching Clusters

  • script done to generate auto.homes file driven by caps - important so we can separate users onto the correct filesystem, teaching or research
    • add ability to get config from file
    • now shifting people about, along with the amount of data involved means this will take a while to do
    • done using rsync
  • script managing home directories nor running in reporting mode for safety
  • working on component, hopefully LCFG mangaeable
  • working on DPIA

CompProj:464 User access to last/ps/w etc

  • gdutton: concrete proposals report continuingTHIS WEEK

CompProj:470 Personalised Portal Page for Academic Staff

  • ongoing, "select" academics testing; need to demo for admin; core development is done; many reports to add or transfer from Portal but that is not part of this project
  • continue on documentation
  • status: very soft launch!

CompProj:506 Teaching Software 2019/20

  • Node.js added (RT:99400)
  • S2: nbgrader remains

HTTP -> HTTPS check

  • rwb to investigate, piggy-backing on neilb efforts
  • Nothing major anticipated.

CompProj:539 User Facing MHR Data Asset Register

  • gdutton: met with cms to capture requirements and consider development for users
  • will stall for the time being

GPU Approved Supplier Procurement

  • ITT went out on 24th Jan, deadline is 14th Feb, evaluation 15th to 21st
    • make sure procurement give us all tender related documentation from suppliers
  • some items will have to be purchased as "call-off"
  • delivery scheduled end March.

Misc Development

  • continuing on migrating some TSP processes
    • still todo form 2 handling, some tweaks requested following review meeting
    • gdutton: other TSP enhancement
    • timc: go over changes with gdutton

  • RT lifespan/retention plus other things - discussed at ops
    • DPIA should precede these changes.
    • add accounts by ldap
    • look at merging emails - needs QA process
    • look at purging and retention rules
    • run pre move to postgresql scripts

Operational

  • 7.6 upgrades
    • CDT VM host is done

  • h/w security decommission
    • wasserboxer - move bridgeport off - flapjack due to be re-installed, due this week
    • fondant - off, needs to be physically removed
    • arcsim - ssh access only (iptables), also wants a VM server for arcsimvm1 (commonrail)
    • bocian/blanik/karenin
      • timc: all ssh access only while migration completed by Rob, manual iptables configuration
    • henwen + wilbur - due for removal now, rack space is precious

  • GPU networking
    • iainr: discuss infrastructure with gdmr/idurkacz
      • add to spending plan if appropriate
      • 10gb cards now pulled
      • 10gb switch now populated with 10gb modules

  • iainr has got power figures from "hannah", max power draw about 8.5A with all GPUs running but if running CPUs power draw goes down
    • data can be collected live from Dells so can get historical maximum, need to check if higher than aircon maximum

  • doing DPIAs:
    • we need to do all our services, in approx priority order:
      • Webmark
      • Theon
      • TheonPortal
      • RT4 (can use some of Unidesk replacement one)
      • ProjSubs, Projects-Archive
      • DPMT
      • Slurm
      • Hadoop
      • License server logs
      • Lab exam?
    • aburford: need to do one for ProctorU
    • timc: doing WhosOff (probably not)
    • privacy statement for:
      • Codegrade - draft with Rena, legal dept have written DPIA
      • Webmark

  • tophat attendance * privacy question, may not be specifically tophat, sitting with IS for comment

  • looking at live capture (transcript) options
    • live broadcast audio much better, latency in transcription may be significant
    • doing more testing with disability office
      • issues with videoconferencing hampering efforts; loan hardware should help
    • likely to be a compromise approach
    • Nick helping with testing, gone back to disability office to suggest purchasing iPad and microphone
      • we have an iPad we could use for testing - ask Alastair

  • lennoxtown still with Novatech, some response since 1st Jan
    • re-prod

  • ubatuba also still broken (2 GPU failures)
    • will wait on lennoxtown and then do a GPU swap to test

  • laphroaig still falling over weekly(?) in use
    • also partially knocks over BMC, further investigation required
    • fell over again - using an ordinary drive for OS and see what happens
    • check for newer firmware?

  • PG v12 - no OIDs anymore, affects TheonUI
    • synthetic oid column is a minimal safe fix

  • Theon (meringue) downtime
    • timc: need to deal with Learn embeds - specific 403 for learn subdirectory would be useful
    • would be nice to monitor postgresql configuration failures at component level -- gdutton to bugzilla

  • postgresql access
    • local trust access allows any authorised user to login as any user account
    • this does not affect our general usage but in the context of a further authorisation failure we would have been at risk
    • remove localhost IP access (we don't need to use it)
    • move all other trust types to peer, adds a further level of security
    • change permissions on socket to limit access by unix uid
    • should probably move nagios to gssapi - need a little bit of work

  • shuffling data from teaching cluster to research cluster
    • iain will do a few and write procedure so richard can then assist

  • pgteach move to teacake / v11
    • done

  • mattermost trial
    • vs. Slack, vs. MS Teams
    • going to use a team created on the EPCC service for trial
    • Inf VM seems to be next best option if we must host it.
    • SSO/Authentication can be added to Open Source edition.

  • removed final FLEXlm from two machines, including legacy matlab and Simics

  • report Rob's VM issues following edge switch reboots
    • bonding protection caused hosts to not see network failure
    • George has proposed solutions written up in infrastructure report for next ops meeting
    • gdutton: bouncing guest media state should avoid reboot, liasing with Rob to test

  • ILCC cluster
    • all these machines should be under Slurm by Wednesday
    • likely to be purchasing more

AOCB


This topic: DICE > WebHome > ResearchAndTeachingUnit > RATUnitMeetings > RATUnitMeeting
Topic revision: r296 - 27 Jan 2020 - 15:15:01 - Main.TimColles
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies