Ongoing work

This is the page where I will report on short-term progress and ongiong work, while the pages under "Work Plan and Progress Reports" are intended to give a more global and structured picture -- VolkerStrom - 13 Jun 2007

3 Jun 2007

Replicating the experiment of Beutnagel and Conkie on several existing Festival 2 voices: 517009 sentences from The Herald are in the making, with voice_cstr_nick_multisyn.devel, voice_cstr_rpx_roger_multisyn_h (the current 13h version) and voice_cstr_rpx_roger_multisyn_i (all artificial text left out), all with default prosody.

11 Jun - 24 Jul

Unit source and total target and join cost is not detailed enough info for interesting statistics. Changed scripts such that they also recalculate and dump all target and join cost components. This is too slow to run on just a few machines. Also, Festival always crashes after a while, probably because of a memory leek (looked for it in vain) => experiment needs to be distributed. Submitted 15,000 jobs * 30 sentences (thrown away the 67k longest ones) to townhill, which failed, probably becasue of bad scaling. Details for this period to be found in ~/texte/ascii/info/cstr/attaca/WORKING_NOTES.

15 October 2007

I overhauled the scripts for the bigger Eddie cluster, and after patching and rebooting Eddie on Oct 10 it runs stable enough for testing and to optimise the scale of each job. 450,000 sentences from The Herald are synthesised with roger_h and expro_target_cost. 1500 jobs * 300 sentences dump the utterance structure *.utt, where the units come from *.dump, all target cost components *.tcc and join cost components *.jcc. After writing them to the local scratch disk, they are individually zipped and (because 2m files are too unhandy) bundled to 1500 tar files "job.%05d.tar" in the group space /exports/work/informatics/festival/usel_stat/roger_h. With the current load of the cluster, jobs will be done in less than 3 days.

5 Nov 2007

It took longer than 3 days and 231 jobs exceeded the time limit of 6 hrs. Re-submitted them with the same time limit, all but 61 completed. Re-submittet the rest as 24h jobs, 9 of them still did not finish. Runtime seems to be rather arbitrary. Furthermore, 22 "job.%05d.tar" consisted of 10240 \0s. Remaking them worked fine, no idea what went wrong. The best strategy seems to have an easy-to-use mechanism for sanity-checking, cleaning up and re-submitting the failed/missing jobs, and iterate it a few times. As of today, 1498 of the 1500 job.%5d.tar are done.

In the mean time, a set of 8 statistics have been calculated for each job.%05d.tar and stored in stat_%05d.tar, in the form of tables and histograms, see /home/vstrom/software/cstr/scripts/festival_tools/usel_stat/README for details. For each table and histogram format, another script was made which facilitates the "table1 += table2" operation, and one that iterates it over a given set of input stat_%05d.tar files such that accumulation of statistics can be distributed over the cluster.

Topic revision: r3 - 05 Nov 2007 - 14:58:37 - VolkerStrom
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies