Final Report :Port Research and Teaching Packages to SL7 (353)

Description

Port All research and Teaching packages to SL7 to maintian and update DICE as a computing environment fit for the purposes of all members of the School. Where practicable software should be brought up to the most recent supported version and where software cannot be ported a suitable alternative should be suggested.

Customer

Academic teaching and research staff and all students

Deliverables

A full set of packages (and their dependencies) to:

  • Fulfill the software requirements for each course as stated by course teaching staff.
  • Provide software for research staff to match that in the dice_sl6_rat.rpms and dice_sl6_rat_updates.rpms

Time

The project approx 22 weeks FTE of CO time, and approximately 4 weeks FTE of CSO time. Project work was done in three "waves" of effort corresponding to the initial port, semester 1 and then semester 2 teaching.

Given past experience of dealing with the teaching sortware it was decided to roll this into the annual Teaching software refresh to try and minimise any duplicated effort .

Procedure

Known "problem packages" were allocated amongst the RAT C(S)O and initially set to one side as it was assumed that these would need upgrading and were likely to be fairly involved. Particularly those with a large dependency base and those like haskell which had some form of self bootstrapping procedure.

The first cut for the main group was to process the existing rat package lists and search for corresponding rpms in the SL7 and EPEL repositories. This generated four lists of RPMS

  1. RPMS which were not automatically available
  2. RPMS where the SL6 version was the same as the SL7 version.
  3. RPMS where the SL6 version was older than the SL7 version
  4. RPMS where the SL6 version was more up to date than that available in the SL7.

These were used to create the table at https://wiki.inf.ed.ac.uk/DICE/TeachingSoftware2015 and to generate information for a mailshot of all course lecturers asking them to confirm their requirements. For each package we did a quick review of available versions and passed this information on to the relevant teaching staff.

The lists from 2 and 3 above were used to generate an initial set of SL7 RAT package lists.

Following on from experience in building the haskell platform rpms, which have a couple of circular dependencies hidden in an rpm dependency tree of ~300rpms a two desktop build farm was set up. Each host was running a script which repeatedly: attempted to build SRPMs from lists 1 and 4 above, the second host running in reverse alphabetical order. In addition they ran a periodic cron job as root which iterated through the build rpms trying to yum install them and running yum-builddeps against the directory of spec files this all generated. Build "tangles" (missing dependencies, renamed packages, circular dependencies etc) would be fixed by hand and periodically batches of srpms would be sent to pkgforge to generate validated builds.which were then included in the RAT package lists, the srpms were then taken out the build pile.

Throughout this process software requirements would be revised or signed off by Course Lecturers which would either fix versions or result in updated srpms being thrown into the build pile. In addition each non-teaching package was reviewed and where a more up to date version was available as an srpm this was acquired and thrown into the build pile. In this instance we were relying on research staff complaining if they were reliant on older software since it was impossible to track who used a given piece of softare, in the event noone complained.

Additional software requests would come in via RT and these, along with existing tickets in the requested software queue would be reviewed and where srpms were availabe these were again thrown on the build pile.

Finally there were a couple of stock rpms which were rebuilt in order to pick up libraries that were not shipped in SL7 by default (audacity and other music players for mp3, numpy for openblas).

We didn't count the number of rpms that were built but it's estimated that upwards of 1000 individual rpms were repeatedly built during this process.

Where it was not possible to build software for SL7 we intended to inform users where there was an RT ticket available or in line with the rest of the upgrade quietly drop it where there was no user to inform. In the event only xv fell into this category and was in fact caught by a final google search which found someone mad enough to patch it for 64 bit.

Problems

There were no particular issues with building any rpms most of the issues were with late requests. These were:

  • Extreme computing and in particular the massive expansion of the course.
  • Secure Programming
  • NuSMV for AR

Extreme Computing

In many ways this was the perennial nightmare with teaching software in that it was a course allocated at the last minute to a new lecturer, involving a complex piece of software, large amounts of (non standard) hardware to a popular mixed course of MSc students and undergraduates with no cap on the student numbers involved. All whilst in the middle of an OS upgrade. The only real saving grace was that it was software which we had recent experience of and which we had already started to port to SL7. Up until we were contacted by the course organiser we were looking to retire the old hadoop cluster, indeed we'd recently binned all the rackmounted nodes and were shrinking the cluster down with a view to leaving a very small cluster for testing/development only. As he wasn't in palce when the email call went out he was unaware of the process. Given the lack of response and no notional course lecturer we were assuming that it wasn't going to run this year. We had done some hadoop development on SL7 with the CDT machines but at that point hadn't had a full cluster running.

Software wise we risked an upgrade to the latest version of hadoop and an upgrade to SL7 to match the lab machines. By using the cdt machines to prototype what we wanted to do we could have the existing cluster as a fallback position. We initially assumed that course nmumbers would be on par with the previous year however it soon became apparent that this would be a large class. This seems largely to have been because another cource did not run as normal and most of the students transferred over to Extreme computing.

In terms of the hardware we grabbed all the available "spare" desktops. The Small format HP 7900s were particularly useful as it was possible to double up memory and hard disks significantly increasing the available disk space The ultra small form factor machines were less useful as athough they had a smaller footprint and power draw they had less memory and disk space and because they used external bricks on 3 pin plugs for power supplies we were limited in . As the possible numbers went up from 200 to 300 we negotiated access to some of the CDT nodes which about doubled the available disk space and upped the core count by half. As it turned out the CDT servers did the lions share of the processing Although this was all in place for the first lab we had loading issues throughout and we would have had serious problems during the coursework periods without the CDT machines.

Secure Programming

We got a very late request from the TA for the android SDK which we were able to provide in time (just) for the labs although not in an ideal configuration fo the students.

NuSMV for AR

This was a combination of circumstances and a mistake on our part. Our mistake was the wording of the reminder which asked the lecturers to confirm any changes. the unfortunate circumstances were that:
  • The software did not automatically build
  • We had a reply from the course coordinator and hence did not flag a null response to chase.

Lessons learned & conclusions.

Despite us raising compute resources at the creation of the Extreme computing course and when we had issues previously the University (college?) will not allow us to set a cap on the number of students who can sign up for the course. TPTB regard the computing resources as being "off the shelf" and as such can be purchased as required. Whilst this is possibly true in theory it leaves very little leeway in practice and assumes that there is sufficient racking, power, and networking in place to cope with servers which would have to be very rapidly specced, ordered and then installed.

There is probably little we can do other than monitor the situation and try to react swiftly. Although 2015-2016 is likely to have been a one off the lack of any cap on those enrolled on the course and the fact that the course is open to all university students means that this could happen again at any point. And of course the same course as last year could be cancelled again. Stockpiling older machines for use in such circumstances is unlikely to be of much use and if it is not possible to purchase new kit we would have to look at switching over current lab machines for dedicated use in the cluster

In terms of the Secure Programming course we hear an annoyingly common refrain: "we were not quite sure what we wanted to do and it's only just come together" in this case we believe it was aggravated by the fact that hardware had only just arrived. We also often hear the argument that Academic staff are reluctant to involve Support/RAT before they finalise their requirements as they don't want to waste sysadmin time investigating dead ends. Whist this may be well intentioned the argument is flawed. It's usually possible to work out fairly rapidly how involved installing a piece of software would be. For awkward software we can plan for the contingency of installing it and give feedback about likely deadlines for installing. In the case of easy requests the software can be installed and even if it is not used for teaching it may be used by other staff. The only scenario where large amounts of sysadmin time are wasted is when complicated software is requested just too late to be installed.

We have through the years tried to nail down requirements by identifying who is responsible for the teaching and trying to push the responsibility for specifying the software for a course on to one specific person. Given that our main problems are with software which is requested very late there have been suggestions that we should introduce a hard submission deadline. IS do this for their software labs, this would be a date after which we would not accept any requests and would be advertised as such.

Although I have found this attractive I think that fundamentally this would be a bad idea:

  • In terms of hardware resource constrained courses like extreme computing we can't provide a workable deadline because we don't know the numbers involved until after the start of the course and we are unable to limit them beforehand, in this respect we are committed like the pig, we have to accept the consequences when breakfast is eventually served.
  • In setting a catch-all deadline we would inevitably be setting an arbitrary one. It would then become difficult to justify not installing something which was particularly straightforward and this would place us in a no-win situation. If we allow the request then the deadline becomes porous and can be ignored. If we stick to our guns and deny it then we are perceived as rigid and obstructive.
  • Where lecturers are appointed late (either in being assigned to the course, or joining the staff) they may be appointed beyond our arbitrary deadline and inevitably we would have to treat these on a case by case basis.
  • Arguably by setting a deadline by when software must be requested we are also setting a timeframe in which we are guaranteeing that software will be installed. There may be a perception within the academic staff that even if they submit a completely unreasonable request before the deadline that request must be met.

I also think it would be a bad idea to set hard deadlines precisely because IS already do this. Being closer to the academic coal face than IS we ought to be more flexible and responsive than an organisation that has to deal with a much larger client base, if we're not then what good are we? I think It would be better to open up the detail of the work involved to the academic staff so they are aware how busy we are likely to be and it's obvious at what stage we are with putting out any of the software.

In trying to formalise the whole notification, request gathering, software building and installation process we have broken one of the rules we have set up for system configuration. That is, for each piece of configuration there should be one definitive source. Or perhaps rather we have obscured this from the academics requesting the software. We provide them with information about what software was requested for their course last time round and we do so during a specific window where a course may be being handed over to another academic. What we don't currently provide them is access to the wiki page which RAT use to hold current state of all the teaching software.

If we use the RAT wiki page, or an equivalent

  • We can send out broadcast messages (possibly in addition to targeted emails) pointing academic staff/TAs at the page and asking them to check that it is correct for their courses and get them to update it themselves.
  • It's easier for new starts to pick up what the correct procedure is, they can be pointed at the wiki when they're given their course and they will also get any general emails.
  • Where courses have been handed over after the targeted emails have gone out they shouldn't get lost between pillar and post.
  • We have the opportunity to provide feedback or nag the teaching body as a whole rather than spending time trying to work out who might be running a specific course.

-- IainRae - 16 Jun 2016

Topic revision: r15 - 20 Sep 2016 - 10:25:29 - IainRae
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies