Final Report on Project 258: Software Download Portal/Repository: Stage 1

Introduction

This report examines the current state of ISDD, whether it's still fit for purpose, and also lists the perceived short-comings (real or imaginary) and whether they could be fixed. It also describes the functionality required by any new Software Download Management Platform.

Following a University-wide survey (http://www.surveygizmo.co.uk/s3/1223698/University-Software-Licensing-Survey) the responses from 48 Informatics staff suggested that around 70% of folk expected to produce software as part of their research. Given funding-bodies' emphasis on the "impact" of research, a reliable and easy-to-use distribution mechanism is essential. Persuading people to use it is another matter, and a task in itself.

A significant proportion of folk said their preferred method of distribution was via their personal University/Informatics web page. This ad-hoc approach means that it's more difficult to track software access and gather statistics. We need to provide and promote a usable and useful service, either directly within Informatics, or indirectly via central Information Services.

What is ISDD and what does it do?

Overview

The Informatics Software Download Database (ISDD) is a semi-automated system, allowing the user to control what software is uploaded, and to distribute software under any licensing conditions. In an attempt to make it easy to use and to reduce the time needed to become familiar with it, it was kept deliberately simple.

The ISDD, http://www.inf.ed.ac.uk/research/isdd/, was originally intended as an easy way of:

  • downloading Informatics software
  • providing a single point of entry for public access to software archives
  • capturing statistics on software distribution (primarily to feed into the then next Research Assessment Exercise)

The University retains ownership of software (it owns the intellectual property - this is part of the employment contract) whether it's distributed through the ISDD, or by any other means. The ISDD enables easier control of downloads and gathering of statistics. Uploading software to the ISDD does not change the rights to ownership of intellectual property.

The ISDD is kept as simple as possible to encourage people to use it - the only technical requirement to uploading code to the ISDD is a DICE account. Only one archive file (typically a ZIP or TAR file which includes code, executables, documentation, and anything else related) is uploaded per package.

The ISDD does not provide version control - this was a conscious decision, made as part of the keep-it-simple approach. When a new version is ready, it simply replaces the old version or, if the old version is to remain available as well, a new ISDD entry with a new name and version number can be created, so that both will remain visible.

The software packages themselves are currently kept in wafer:/disk/drookit1/infweb/osdb/packages, and there are currently 120+ extant packages (of which the most Recent Upload is RABIT - the RAmsey-based Buchi automata Inclusion Testing suite - made available on March 19th of this year). The PostgreSQL database files are under wafer:/disk/data/infweb/osdb/db.

Who uses it?

The use of ISDD has decreased of late, the download & upload stats show little activity since 2010. A snap-shot from earlier this year shows:

Top downloads:
1 SIMJAVA 2 downloads 2162
2 HASE-III downloads 1936
3 WebExp2 1.3 downloads 1331
4 RXP downloads 1264
5 AIAI Case-Based Reasoning Shell downloads 982

Recent uploads:
2013-03-19 RABIT
2010-08-09 TFInfer
2010-03-23 LTXML2 source
2010-03-23 RXP
2010-02-17 XQuery Update analysis tools

...and note that while there has been only little activity in "Recent uploads", there have been some downloads over the last few weeks:

Top downloads:
1 SIMJAVA 2 downloads 2318
2 HASE-III downloads 2186
3 WebExp2 1.3 downloads 1456
4 RXP downloads 1391
5 AIAI Case - Based Reasoning Shell downloads 1007

It can be seen that the mechanism is still in use (both for uploads and downloads) but that usage is not enormous. Access is from many and various domains, so the use is widespread (over 500 separate external domains and sub-domains), just not very voluminous.

Ease of Use

Although specifically designed to be simple and easy to use, this has not translated into significant popular usage... in fact, very few people use it. Only (just over) a third of the Informatics users who responded claimed to know of its existence, and only five folk acknowledged ever having used it at all (and no-one said they'd use it now if they hadn't known about it before).

There was some recognition, however, that it was a "very easy way to distribute software", although there were also concerns that it was a "bit complicated" and "awkward and inconvenient". There was some positive support for the stats-gathering aspect, but although this is useful as a feature, it cannot be strictly regarded as a usability issue.

There were also some apparent misunderstandings about licensing, with some users being unaware of the licensing options and requirements supported by the University.

Shortcomings

It was pointed out by several respondents that there are other ways of distributing software, with proprietary and public domain package managing software being thought more useful. There did appear to be some confusion over what constraints were imposed by the ISDD, with some people expressing the desire to retain control over the distribution, documentation and support. The fact that the ISDD doesn't actually impose such constraints seems to have been unappreciated.

Another issue was the fact that there is no ongoing management (so entries can become outdated, and links break - for an example, see https://www.inf.ed.ac.uk/research/isdd/admin/package?view=1&id=94 and referenced URL http://project.soma-band.com/, which has expired). If an entry is updated, or a newer version provided, then older published URLs for the same package will not reflect this (the "id=" element would have changed).

The lack of support for version control was also seen as a shortcoming. Any code under development that is available through ISDD might not be most recent version. Although a deliberate design decision, this was something that was widely perceived as a deficiency.

Although functionally useful as a distribution mechanism, the ISDD does have rather a low profile for external users searching for software. This is also the case for internal users, as has already been noted.

The lack of a version control mechanism has already been mentioned, and there is also no provision for collaborative work on code in development. Whilst in could be argued that this is not what it was designed for, the lack of such a provision may have contributed to its lack of use.

It was noted that some useful utilities and small data sets would be made available for download if there was a really quick and easy "one-click" or "click-through" mechanism for submission... anything more than this makes the effort not really worthwhile.

Some responses seemed ill-informed, and criticisms were made that seem to be unfounded - so maybe documentation and usage instructions are incomplete...

Fixes

Of the shortcomings noted above, most can be fixed with some judicious changes - although how cost-effective this would be is unclear.

Some possible approaches that might alleviate some of the reported failings might include:

ongoing management:

  • include expiry-time for software entry
  • regular contact/mail submitter with "still current?" message
  • automatic check of included URLs
  • check submitter is still UoE/Informatics member

no versioning

  • re-assess initial "no-versioning" decision
  • incorporate versioning system?

low profile for external users

  • more publicity
  • tie-in with UoE IS

low profile for internal users

  • more publicity
  • include as part of standard procedure (mandate usage)

no collaborative working

  • not really intended for WIP
  • re-assess initial design decision?

misunderstood usage

  • more publicity
  • better documentation
  • enhanced submission procedure

What is "Edinburgh DataShare" and what does it do?

Overview

Based on DSpace software, Edinburgh DataShare is "an online digital repository of multi-disciplinary research datasets", where these datasets have been created at the University of Edinburgh. It is free at point-of-use (whatever that means) and allows researchers to upload their data for sharing, licensing, and distribution.

Edinburgh DataShare acts as a "trusted repository" for research data if no other funded resources or grant-specific options are available. Deposited data will be "discoverable and accessible" by the wider research community even after the end of the project for which it was gathered or by which it was generated. A permanent identifier can also be registered with a funding body to maintain permanent access, and a "suggested citation" is provided to help with external searches.

Who uses it?

This service is open to all researchers and relevant staff within the University, and so has a large potential user-base. Within Informatics, only 7 respondents (out of 33 who answered the question) said they were aware of the Edinburgh DataShare service. There is no indication of how many other, non-Informatics, users there are.

The emphasis would appear to be on research data rather than application distribution, and so there may be a perception that it is not intended as a software distribution platform - although there is no indication that this is either encouraged or discouraged.

Ease of Use

The Edinburgh DataShare service is centrally managed, which reduces the call on local resources. Once deposited, management of data is taken care of by Data Library staff. No transfer of ownership or other legal change is made by virtue of depositing the data.

Data submission requires following a standard procedure, although this does not appear to be too onerous, and consists of:

  • grouping files into datasets ("items"), and allocating data to a "community" (a School or Research Group) and a "collection".
  • confirming support for the relevant file format, and identifying any specific software-generated data (that might have proprietary creation, editing or compression methods).
  • preparing dataset documentation (including research methodology reports and any other relevant information)
  • confirming permissions, distribution rights, and data protection constraints
  • agreeing licence type & details

As well as providing archive and distribution facilities, the Edinburgh DataShare service has a search facility to allow users to browse potentially useful data deposits.

To provide feedback to funding bodies, and for the information of data owners, usage statistics are provided to track data downloads.

Shortcomings

Although Edinburgh DataShare is a centrally managed service, awareness of the service doesn't seem much better than for the ISDD.

The registration and initial configuration process is, like the ISDD, web-based... but the required information is much more constrained and seems slanted towards particular data collections (structured databases rather than raw data), which makes completion of the pre-defined form when adding data items rather unintuitive.

When depositing data that has been derived from collaborative work, the above constraints make obtaining all relevant details from data contributors/collaborators rather difficult.

As items have a persistent identifier and a time-stamp, they should not be modified much after they go live - rather a new version should be created. This avoids the problem of changing references.

Some non-standard datasets may be "rejected" by the DataShare editors if they are not sure what to make of uncommon file formats (where an "uncommon format" may just be an informative but non-standard file extension to a standard file type: .matlab instead of .txt, for example... even though the file is text).

What do we need?

There are functions that ISDD performs satisfactorily, functions that are not available by design, and functions that could be added to improve its usefulness. However, irrespective of any feature list, it is obvious that there should be more promotion of data and application repositories (ISDD or similar) and associated tools.

Requirements for a Software Download Management Platform for use within Informatics can be grouped as follows (a mechanism to actually upload and download the software is assumed!):

Mandatory: ISDD DataShare
submitter upload (no CO intervention required) yes yes[1]
restricted upload permissions yes yes
licence type selection yes yes
stats/tracking of uploaded software yes yes
stats/tracking of downloaded software yes yes
provision of basic details (description, licence, download) yes yes
simple browse/search ability yes yes
basic documentation yes yes
Desirable: ISDD DataShare
report generation (per item, or submitter, &c) yes not known
ease of licensing ("click-through") yes[2] no
version control (easy updating, integration with CMSes &c) no not known
collaborative working no not known
administration of existing items/details yes not known
capture of user/licensee details yes
online payment no no
advanced browse/search ability no[3] yes
"top ten" upload/download summary yes yes
comprehensive documentation no yes
notes
[1] there may be an "editor approval" step?
[2] ...available, but not exactly a one-step process
[3] ...but easy to implement?

Conclusion

If those who responded to the survey are representative of users in general, then the current ISDD would appear to be not too far from what people claim to want. The main lack is of an integrated versioning system, with or without a collaborative working option - and it is debatable whether this should actually be part of a distribution mechanism anyway.

Assuming that the purpose of any distribution system is to actually distribute something, a stream-lined, easy-to-use, and widely-advertised system should do the trick. It needs a robust core functionality, a high public profile (would an IS-based service give a higher Google-ranking than an Informatics one?), and an integrated usage policy.

-- RogerBurroughes - 28 Aug 2013

Topic revision: r4 - 09 Dec 2013 - 09:38:04 - RogerBurroughes
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies