#387 Continue to consider how best to make use of central data storage

Central computing services now provide a number of data storage options to the wider University. Until now, the School of Informatics has made little use of these services, preferring to manage its own storage, but the time now seems right to reassess whether it would be beneficial to the School to make more extensive use of central storage. This project would examine the available options and provide guidance on when and how School users might make best use of these facilities. Although this will primarily be of benefit to researchers, the project should also consider options for other staff and students. There is a separate project (#369) which is aiming to move all admin staff to using DataStore for both shared space and home directory space.

The 5 main options to be considered are:

  • DataStore - DataStore is a file store for active research data, and is available to all research staff and postgraduate research students (PGRs).
  • DataSync - a dropbox like facility for staff and PGR students, 20GB DataSync storage with the ability to connect to your personal or group share on the RDM DataStore
  • ECDF - A large amount of high performance fibre channel storage is available to users of the Research Services compute cluster, Eddie.
  • DataShare - Edinburgh University researchers who have produced research data associated with an existing or forthcoming publication, or which has potential use for other researchers, are invited to upload their dataset for sharing and safekeeping.
  • DataVault - A service to ensure integrity and long term retention of golden copy research data, linked to PURE. Creates a Digital Object Identifier (DOI) to allow easy citation and discoverability.

DataStore

The University of Edinburgh (UoE) Research DataStore is one element of the implementation of the University's Research Data Management Policy. IS provides Windows, MacOS and self-managed linux instructions on accessing the centrally provided DataStore filespace. You can access DataStore from a linux machine by either using the CIFS Windows share method (the preferred method, apparently), NFS or samba. NFS is restricted by a firewall, the machine needs to be registered with ECDF and must have a static ip and is not the recommended way to connect. Samba is a recognised and well-tested means of connecting if not rather out-dated. For MacOS, samba is recommended. In all cases, the machine must be connected to the University network either directly or via VPN.

All researchers and PGR students in the University are automatically granted a personal allocation of 500 GB on the DataStore service, and half of this can be reallocated to a shared group or project space. Additional capacity can be purchased for £175 per TB per annum and there is support for very large data (>1 PB). It is fully backed-up and you can connect your DataStore personal or group share with DataSync to share your files anywhere and with anyone.

We only have a total of 41 research staff who belong to a research group on DataStore.

We have local CO documentation on DataStore and also documentation on how users can access their DataStore area on DICE.

Previously we explored the possibility of AFS on ECDF/Datastore but concluded that although we could get AFS working on datastore, the disadvantages outweigh any advantages.

DataSync

This works well for data in DataStore. IS documentation is available but we should add a link to it from computing.help.

Can our users sync their local home directories to it ? (Would we want them to be able to do this since we already provide a backup service ?)

ECDF

ECDF offer a number of services aimed to satisfy the computational requirements of researchers.

  • High performance computing, including:
    • EDDIE Linux compute cluster
    • GPGPU service
  • data
    • HPC File Storage - A large amount of high performance fibre channel storage is available to users of the Research Services compute cluster, Eddie. You can request to have group DataStore spaces mounted on the cluster using NFS.
    • Backup Services
    • Raw data services - In addition to offering file services to end users, the Research Services can also assist with the provision of large scale block device storage systems in collaboration with School IT staff.
  • Cloud Computing Service

DataShare

The DataShare website specifically mentions "Bringing together Biological Sciences, Chemistry, Engineering, GeoSciences , Informatics, Mathematics, Physics and Astronomy". "Edinburgh University researchers who have produced research data associated with an existing or forthcoming publication, or which has potential use for other researchers, are invited to upload their dataset for sharing and safekeeping. A persistent identifier and suggested citation will be provided. " It appears to be used by a number of researchers in Informatics but needs to be advertised more widely. Increasingly, it is becoming a Research Council requirement that datasets are made available for others to use.

DataVault

This is an alternative to DataShare which is an open repository. The aim of DataVault is to provide Principal Investigators and co-PIs with a safe place to store research data which they are no longer actively developing but need to keep, and which cannot be published. As this facility will use encryption, it can be used for personal data. The expected date for this service to be available is the end of February 2018.

The DataVault allows data creators at the University of Edinburgh to:

  • Store their data safely in the University’s archival storage option
  • Link this data to a record in Pure without having to re-enter any of the data
  • Optionally, receive a DOI for the data which can be used in publications and other outputs (on condition the associated Pure record is publicly accessible)
  • Comply with funder and University requirements to preserve research data for the long-term, and
  • Be confident that their data will be there for them to reuse in the future as and when required.

The DataVault Project commenced development in February 2017 and is addressing a range of requirements to ensure the service is fit for purpose across the University.

See IS DataVault pages for more information.

Costs (AFS vs DataStore)

  • AFS
    • 200GB free
    • £250 per additional 500GB + mirror £250 + tape backup £250 (one-off costs)

  • DataStore
    • 500GB free
    • £175 per additional 1TB per annum (includes backup)

Other Links

CO effort

22 days

-- AlisonDownie - 14 Feb 2017

Topic revision: r17 - 08 Apr 2019 - 07:43:40 - AlisonDownie
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies