The scripts are in the
gmtk_tools
CVS tree under
gmtk_tools/scripts/triangulateGA_SGE/triangulateGA.pl
. You'll need to set an environment variable GMTK_TOOLS to point the root of your local copy of that tree.
An example call:
$GMTK_TOOLS/scripts/triangulateGA.pl
-strFile PARAMS/timit_training.with_gender.str
-timingExportLine "qrsh -b y -cwd "
-timingScript "$GMTK_TOOLS/bin/gmtkTime -probE -fmt1 htk -nf1 39 -ni1 0 -of1 DATA/training_observations.scp -inputMasterFilePARAMS/nonTrainable.with_gender.master -strFile PARAMS/timit_training.with_gender.str -inputTrainableParameters PARAMS/gender_sensitive_models/model_5.gmp"
-iswp1 T -nf1 39 -ni1 0 -fmt1 htk -of1 DATA/training_observations.scp
-long
-inputMasterFile PARAMS/nonTrainable.with_gender.master
-inputTrainableParameters PARAMS/gender_sensitive_models/model_5.gmp
-outputDirectory genderTrainingTriangulations/
-parallelism 40
-seconds 30
-useExistingBoundaries
genderTrainingTriangulations/timit_training.with_gender.str.trifile > LOGS/triangulateGA/genderTrainer.stdout 2> LOGS/triangulateGA/genderTrainer.stderr
The arguments mean...
- A lot of the options have the same meaning as they do under gmtk.
-
-seconds
tells it how many seconds to allow each timing run to last - too short and it won't get through many chunk frames or reach the epilogue - too long and it'll take too long. 30 seconds was a figure that made sense for my structure (in that the speed metric (partitions/sec) levelled out there).
-
-parallelism 40
tells it to create, erm, 20 timing threads when parallelising the benchmarking. Not sure why the script divides by two here...
-
-long
sets a bunch of internal parameters indicating how long the run should take (longer means potentially better). The other options are -medium
and -short
(the default)
- You can optionally provide a triangulation to start from (this could be the output of a previous run)
genderTrainingTriangulations/timit_training.with_gender.str.trifile
above
- Unless you state
-useExistingBoundaries
it'll start with a boundary search. The boundary search is also distributed using SGE. You'll need to do a boundary search the first time round but after that you can use the -useExistingBoundaries
option.
- The output turns up in directory you ran from, named strfile_name
.best.trifile
. It is updated with the current best triangulation as the script goes on, so you can use one produced halfway through the run if you're feeling impatient.
-- Main.s0565860 - 19 Jun 2006
Topic revision: r3 - 20 Jun 2006 - 16:52:18 - Main.s0565860