Development Project 137 - Improvements to Mirror System - Final Report

Summary

This project, Improvements to Mirror system, was meant to be some "quick win" fixes to the existing mirror service to improve reporting and make it easier to setup mirrors. Hopefully the end user documentation (ServicesUnitMirrorService) does show that this goal has been met, though perhaps not a quickly as hoped.

The work has over run by about 10 days, mostly due to difficulties in getting the distributed reporting to work reliably. If given the task again, some earlier design decisions would be made differently.

Work Done

An updated rmirror component (which runs on the mirror server) with a new report method, and changes to the existing run method to allow selective mirrors of clients. Also, related support scripts were written.

New rmirrorclient component (which runs on the mirror client), it isn't used for much, other than storing some spanning map info and a report method.

New header files and MACROs to simplify specifying what should be mirrored, for the client and server.

Web page that summarises the state of all the mirrors.

Issues

The main technical issue was the desire to have mirror servers and clients accessing the same dynamic data. This was due to the fact that we typically have multiple mirror servers at a site, eg at JCMB, so when there is a client in the Forum, we don't really care which JCMB server is mirroring it, as long as one of them is. We don't want to have to hard code the actual mirror server into the client profile, so when a client asks (via the report method) what the state of its last mirror was, we have to ask for the report from all the JCMB mirror servers and then weed out our entry.

I did think about having some central logging service that the servers would record the success (or otherwise) of the mirrors of their clients, and the clients would then know to lookup their record from that central service. However, this seemed to be introducing a single point of failure that we didn't have previously, and I liked the idea of the distributed data. So even though no one machine holds the complete state of all the mirrors, by querying them all you can build up a complete picture of the mirrors.

Having this distributed data (each server having only its view of the mirrors) meant that I needed a way for a client (or other server) to be able to call the report method of another mirror server. I could do this easily via ssh using my existing credentials, but if it were triggered via 'om', user credentials are ignored and so remote ssh would require a password. So rather than developing my own client server solution, I decided to use remctl with kerberos keytabs to authenticate and authorise the remote calls to the report methods.

My ideal solution would have been if LCFG provided dynamic, programmaticly update-able, spanning map resources. The standard LCFG spanning maps allow you to access client resources at compile time, so it seemed natural to be able to update an LCFG resource via the component and have that updated value accessible via the spanning map. However I can see that doesn't quite fit with LCFG being a configuration system.

There was another issue at the client end. We've been quite happily using the existing rsync component to make the data on the client available to the server, but I needed somewhere to store client some resources for the spanning maps and host the report method. So the new rmirrorclient component was created. The new MACROs simplified the specifying of the rsync resources, but there were still things I couldn't do as neatly as I wanted. For example even though the rmirrorclient has enough information to dynamically create the rsync "host allow" config option (listing say all the JCMB mirror servers), I can't expose that at the LCFG level to the rsync component. So at the moment there is a #define in one of the header files that enumerates all the mirror servers at a particular site, this needs to be manually updated as mirror servers come and go.

Then there's also the issue that any rsync resources need to co-exist with any other use of the component on the client. If I were to do it again, I think it would be best to extend the minimal rmirrorclient component to take care of configuring it's own rsync.conf file, and spawning it's own rsync daemon on a different port. Thus avoiding any potential clashes with other uses of rsync on the client, and allowing a more dynamic update of the hosts allowed to rsync data from the client.

Effort

Two FTE Weeks were allocated. Nearly four FTE weeks have been used, 19.5 days. The break down of effort by tird is as follows:

T days effort
2010 T2 0.73
2010 T3 2.06
2011 T1 2.74
2011 T2 10.67
2011 T3 2.50
2012 T1 0.96

The bulk of the coding was complete, and the system in use, by the end of 2011 T2. The remaining effort was spent on fixes as they came to light, migration of servers to the new system and documentation.

Conclusion

The original rmirror component is a bash shell script, the reporting part fell more naturally to being a perl script. At the time I swithered about re-coding the original script in perl, so that I didn't need to have the rmirror-report as a separate script. In hind sight I would have been better redoing the original component in perl.

Similarly, the rmirrorclient component should do all the work, rather than just configuring the existing rsync component. It would probably still use the rsync daemon to take care of the actual data transfer, but would run on a non-standard port, so we could configure it independently of any possible other use of the standard rsync component.

Again it's quite disappointing to see how a 2 week project, turned into 4 weeks effort spread over nearly 2 years! I suspect this was because the existing system basically worked and there was no mission critical need to get the work done, so it kept being prioritised below other tasks.

-- NeilBrown - 30 Jan 2012

Topic revision: r3 - 31 Jan 2012 - 16:04:24 - NeilBrown
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies