Summary of pandemic planning meeting (19/08/09)
(alisond, ascobie, cms, gdmr, perdita, timc)

Scenarios
=========

Agreed plan *NOW* for almost certain scenario of reduced staff.

Decreasing probability -> increasing severity

* Reduced staff => drop development ring-fence
* University decrees reduced face to face contact (teaching, research,
  admin - eg tutorials, meetings)
* University decrees no teaching (continue research)
* University closes buildings (continue research)

At some point :-
 * open up any unit access restrictions
 * document any passwords (eg FC switches) and store for all
   to access
 * create AFS admin principals for more computing staff
Later :-
 * freeze deployment of patches and software upgrades
Later :-
 * freeze configuration changes

Remote support for users
========================

DICE users
----------
Users can use VNC. Need to provide documentation. 
Need to check capacity of ssh login servers - perhaps
stop mounting home directories on these (on all bar one
for eg. unison and scp users). Staff can ssh onto their
desktops; need to think how to spread students over
student lab machines. 

Laptops (and self managed machines)
-----------------------------------
* filesystem (AFS) - need updated documentation
* web - possible issues with IP address controlled content
      - editing web content via ssh ok
      - need to check how non unix skilled staff edit www.inf 
        web content
* editing web content
* IS hosted usenet groups (for teaching courses) - need to check
  whether IP address controlled.
* openvpn - solves IP address controlled issues. Make a service, (documentation?) 
* school DB - secured VNC - need documentation

Service continuity
==================

Critical services (and people SPFs)
----------------------------------

Although we probably don't have any true SPFs, there are a number 
of critical services where we need to improve skill coverage.

These are :-

* network
* serial consoles / remote power management
* storage arrays / SAN
* LCFG release mechanism / package service
* virtualisation infrastructure
* AFS
* TiBS / retrospect / Sun networker
* School DB
* plone

The following were identified as critical, but we believe there
to be sufficient skill coverage.

* LDAP, kerberos, cosign
* traditional web based services
* wiki
* RT 
* ssh servers

Actions
=======

* Employ FC multipath and ethernet bonding wherever possible (esp
  critical services). (NEW - Also move service related configuration into header
  files from individual machine profiles to make it easier to move services
  from one machine to another )
* Add nagios monitoring wherever possible (esp. critical services)
* Check capacity of ssh login servers (and consider whether to
  stop mounting home directories)
* Consider how to spread remote students over student lab machines
* Consider how non unix skilled staff edit www.inf content
* Check whether IS hosted usenet groups are IP address controlled
* Documentation
  - AFS
  - VNC
  - openvpn?
* Identify any single point of failures re remote management of machines
* Improve skill coverage for those critical services identified to have
  weak coverage.
* For each critical service, document how to deal with the top 5 
  things that routinely need doing/go wrong/etc.

-- AlastairScobie - 25 Aug 2009


This topic: DICE > WebHome > PandemicPlanning > PandemicMeeting190809
Topic revision: r3 - 31 Aug 2009 - 19:24:50 - AlastairScobie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies