MPU Meeting Wednesday 14th February 2018

Inventory

Chris is working on a client report module for disks and RAID controllers.

User Security Training

Chris has reorganised his ThoughtsOn403, he also has some more to add to the page.

Virtual Desktop

The xrdp test service is now using a Quovadis certificate. The PAM configuration has been improved so that access to SSH and XRDP services can be controlled separately. The DICE headers are now close to being finished including that for the staff server. A "no cookie" backend has been added to handle the MacOS client not always using one. The login screen has been improved but still needs an Informatics logo, Graham is going to see if he can come up with one for us. Since lightdm is not required the config has been simplified by switching the screensaver back to that supplied with MATE. We would like to use fail2ban to block attackers but it seems that xrdp does not log the IP address for login failures. An alternative approach would be to use haproxy to do rate limiting, Stephen will investigate that option. Chris and Stephen have been working on the computing.help documentation.

Miscellaneous Development

apache configs
Stephen has been using his scripts to check all the MPU apache configs. He found one minor problem in the bugs.lcfg.org config where it was still using the legacy-style Allow directive. Before we are able to disable the compatibility module we need to locate and check all .htaccess files.

Project blogging
We agreed to blog weekly on project progress, that could then be copied into the minutes for the MPU meetings.

Operational

drupal
We need to check the version we're using for the computing.help service.

IPv6
More MPU servers now have SLAAC addresses, this week beaver, amarela, vermelha and nuthatch were done.

SL7.4
The NX server hammersmith has been upgraded to SL7.4

journald
Alastair has been looking into setting the maximum log retention time to 1 month. We need to clarify if that is applied to per-user journals and also to inactive users.

package cache virtualisation
We agreed that it is not really sensible to virtualise the package cache servers since we really need those to be available when other machines (including the KVM servers) are booting. Without them the boot process would be much slower.

user entitlements
We have various manually maintained lists of users for services such as LCFG subversion and computing.help. How should we handle "no grace" for those lists? This is probably a wider issue, Alastair will raise at CEG. Who owns the sysman role? What about rfe access?

This Week

  • Alastair
    • Inventory project
      • continue working through TartarusWorkFlow
      • Document clientreport (eg how to add modules)
      • Document order sync code
      • Document hpreport processing script
      • Start work on final report!
      • Consider what else needs done other than docs and tidying and backups
      • Blog something....take dev meeting talks
      • and give details on how Tartarus tables are accessed to Ian D for inclusion in his privileged access discussion paper
      • Look at postgresql replication (do after shipping)
      • make ipv6 changes permanent
      • Add tartarus info to SwitchToSelfManaged
    • Schedule MPU meeting to discuss systemd ordering
    • Check sysmans (et al) have 'nograce'.Looks like they do
    • Take a look at RT #78875
    • Look at /etc/hosts - dns issue (IPV6?)
      • work out what we need to fix current problem
    • Circulate info on RH7.3 systemd changes we may wish to consider
    • RT actions (as agreed)
    • Implement change to kvmtool to allow KVMs to be marked as disabled
      • looked at this - looks like the metadata tag isn't passed through libvirt (prior to 4.0.0), so can't be read/written by kvmtool
    • Look at Stephen's 'Thoughts on shell components'
    • Look at MPUActivitiesList
    • Start looking at https and computing.help (remove assumption that https means want cosign login)
      • wait on Neil's efforts with EdWeb
    • Chase Alison about LCFG check monitoring ( start doing again )
    • Investigate systemd reboot bug on gaivota and add some more debugging (store tree diff somewhere)
    • If in Forum server room, review MPU rack usage
    • Start upgrading MPU servers to 7.4
      • upgrade salamanca - remember to update firmware (Check whether this is needed)
    • Check that our journald configuration correctly implements our retention policy
      • It doesn't. journalctl shows entries from last year (eg May 17 for jubilee).
      • Possible solution is to set MaxRetentionSec =1month in /etc/systemd/journald.conf - but not convinced setting this on existing machines clears up old per-user journals for non active users
      • Implement above and keep an eye on it (wrt old per-user journals). Are lab machines using separate or unified journals? Implemented (using a macro defined at LCFG level)
      • Report on this at next ops meeting
    • Discuss with Neil - drupal username collection re GDPR
    • Inventory stuff re GDPR
    • Look at allowing host based access control to unauthenticated Tartarus API
      • Do we really need the unauthenticated API?
        • scripts can use the hosts machine principal (though the scripts would need read access to this)
        • the performance difference isn't as significant as ascobie had feared (perhaps 30%?) - at least for 'ii query'
        • would need to cosign the web interface, but it only does one API request per web page, so performance hit shouldn't be noticeable
    • Check with Tim / George about capability for login to student machines - where are we
    • Read Chris's ThoughtsOn403
    • IPV6 remaining computing.help serversConfig in stable header - add %slaac to hulp and lagun after 21/02/18
    • Look to see if we have some spare hardware we could use for RDP test service (need >= 32GB)
    • Lend Chris android device for playing with RDP
    • Check current version on drupal service We're running the latest Drupal 7 - 7.56
    • Useful? - a script which checks how fast a machine's console log is growing (eg huge number of dbus problems on hammersmith)
      • suggest to Ian D
    • On hammersmith - serial console "l0 break" not having any effect
      • Check works now
    • Blog on projects
    • Create a web page (only MPU access) that we can use to record manually configured entitlements (eg Drupal, LCFG subversion) Created MPUManualEntitlements
      • Raise at CEG that all units may have manually configured systems
      • Check re ordershost and rfe access
    • KVM pcid
Following config worked on 'brent' (hosted on vermelha). We might need to consider whether we want "match='exact'" wrt migrations.
<cpu mode='host-model' match='exact'>
<model fallback='allow'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pcid' />
</cpu>

  • Chris
    • Inventory project
      • Continue work on clientreport modules for replacing firmwarereport
    • Look at MPUActivitiesList
    • Look at RT
    • Continue work on SL7 coordination final project report (currently pending other units completing)
    • If in Forum server room, review MPU rack usage
    • libvirt - test for memory leaks (wrt console servers) Ian will test it for memory leaks after the 17 January stable release
    • User training materials project #403
    • Add %slaac to jornets and aegean (after fixing http config)
    • Move metropolitan 600GB pair to circle and repartition /dev/sda to use all of disk (giving some space for KVM guest storage)
    • Tidy up hammersmith and jubilee profiles
    • Blog on projects

  • Stephen
    • LCFG client refactor stage 2
    • RT actions (as agreed)
    • submit polkit bug to redhat - with Alastair (still exists under 7.3)
    • Produce some text for systemd mount bug (to submit to RH)
    • Take issue of disable per user journald logs on certain servers to OPS
    • Schedule jubilee downtime to move to SOL
    • Consider PD work for after LCFG client ...
      • looking at Ceph
    • Look at MPUActivitiesList
    • On metropolitan, find fast baud rate we can drive the real physical consoles. (This so we can decide whether to use physical consoles for KVM servers).
    • Look at where we're using ALL in access.conf
    • If in Forum server room, review MPU rack usage
    • Agree with RAT how software package requests are handled - waiting on Graham documenting
    • Start off NX replacement project (#389)
      • Complete Documentation
      • Look at introducing test service for staff users
    • Upgrading MPU servers to 7.4
      • NX servers - jubilee
    • Decommission DL180s in AT previously used Ceph testing
    • Read Chris's ThoughtsOn403
    • Check whether websites are still using Allow/Deny configuration
      • Check individual .htaccess files
    • Blog on projects
    • Look at LCFG entitlements
      • SVN
      • rfe access
    • Bring LCFG v4 client project to closure

-- AlastairScobie - 14 Feb 2018

Topic revision: r10 - 23 Sep 2019 - 13:33:40 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies