-- Main.simonk - 11 May 2006



From bilmes@ssli-mail.ee.washington.edu Mon Jul 26 09:19:46 2004
Date: Mon, 26 Jul 2004 01:19:34 -0700
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@ssli-mail.ee.washington.edu
Subject: [Gmtk-users] gmtk dev tag update

Major change:

- new clique beam pruning option, so that it prunes out all but the
top k in a clique. This way the user can upper bound the computation
much easier than before. Give programs the -ckbeam <int> option, where
<int> is the number of entries in a clique to retain. To get an idea
how many clique entries you have, run with -verb 70.

Note that in some graphs, this can have a huge speedup!! (but it
of course is not exact inference any longer).

Minor changes:

- CPT speedups
- close to finishing virtual evidence CPT and DT formulas with child
information (child frame, cardinality).

I haven't really tested this version, so there might be bugs, but please
give it a try, especially -ckbeam!!

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Mon Jul 26 23:15:33 2004
Date: Mon, 26 Jul 2004 15:15:27 -0700
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@ssli-mail.ee.washington.edu
Subject: [Gmtk-users] Another update: triangulation scores

I should also mention that in the latest dev tag, there also is a new
triangulation score heuristic.  I'm not sure it will work better than
the old one, and you will need to tune it, but it can re-find
completed. The new options are:

-jtwUB

along with either or both of

-jtwSNSC scale1  -jtwDNSC scale2

The first option turns on the upper-bound score, which by itself
is just like before. The 2nd two parameters determine the tightness
of the "upper bound" (although as soon as the two scales are < 1, it is
potentially no longer an upper bound).

The first one (scale1) scales the charge of the sparse/deterministic
nodes coming into a clique, and you probably want to decrease it roughly
based on how much determinism you have in the graph. The second one
(scale2) scales the charge of the dense nodes in a clique, and you
should reduce it roughly according to how much pruning you plan to do
(i.e., reduce jtwDNSC if you decrease one of the beam pruning
parameters).

The scales can be positive or negative, i.e., you can use:

-jtwUB -jtwSNSC -4.0

(so it's of course no longer a "weight", it's really a "score" you are
trying to minimize, you might get some negative weights this way).

Using -4.0 above, it re-found completed (which is good I suppose) but
again I'm sure there are better ones out there (since it still didn't
find the 2-completed case in the E partition).

But if you happen to find good scale parameters for the above, let me
know! (again, no guarantees, the right thing to do is to use the new
program which doesn't yet exist).

Let me know if this works for anyone.

Next up: virtual evidence CPTs.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Sat Sep  4 06:12:27 2004
Date: Fri, 03 Sep 2004 22:12:19 -0700
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@ssli-mail.ee.washington.edu
Subject: [Gmtk-users] GMTK dev tag update, major new features

Folks,

I've done a bunch of new work on GMTK recently, and the dev tag has now
been updated. This includes:

Source code changes:

1) A completely new RV hierarchy, almost all of the code has changed
in a (quite significant) way. As a result, the old versions
(gmtkEMTrain, gmtkViterbi, gmtkScore) are no longer supported and
are not included (and will not even compile).  The good news is that
this change gives about a 10-20% speedup on all the graphs I've
tried over the previous new versions.
2) new simplified, more flexible, and faster CPT interface (useful
for implementing new forms of CPTs).

Bug fixes:

1) working viterbi island algorithm (the previous one had a bug which is now fixed)

Major new user visible features:

Summary:

1) There are immediate constants for RVs, namely:   frameNum, numFrames, segmentNum, numSegments
2) switching weights (including scales, penalties, and shifts) + word insertion penalties
3) virtual evidence CPTs + Hybrid DBN/{ANN,SVM}
4) better and more informative error messages (most now include file line number location).
5) better -debug values and much better and more informative verbose output with -verb 50 (or 60, 75 70, and 80)
6) smarter gmtkTime with -multi option, no more thrashing and crashing when timing bad triangulations

Check out the latest dev tag to get a working version.

Details on the user visible changes:

1) There are immediate constants for RVs, namely:   frameNum, numFrames, segmentNum, numSegments
--------------------------------------------------------------------------------------------------

This means that what you can do is define an observed constant
variable with an immediate observation, but rather than give it an
integer, you can use these 4 keywords. E.g.,

type: discrete observed value frameNum cardinality 300;

So you can define a frame variable as in:

variable : frame {
type: discrete observed value frameNum cardinality 200 ;
switchingparents: nil;
conditionalparents: nil;
}

The only thing is you need to make sure cardinality is big enough
as otherwise, you'll get a run-time error. Also, since these are
observed, they really don't cost anything.

2) switching weights (including scales, penalties, and shifts) + word insertion penalties
----------------------------------------------------------------

Here's an example:

variable : word {
type: discrete hidden cardinality VOCAB_SIZE ;
switchingparents: wordTransition(-1)
using mapping("directMappingWithOneParent");
conditionalparents:
word(-1) using DeterministicCPT("copyCPT")
| word(-1) using DenseCPT("wordBigram");
weight:
nil
| penalty -100 ;
}

This gives a switching weight, and adds a penalty in the 2nd case
(when we insert a new word). Penalties are given as log base e
(natural log), so that we can implement true word insertion penalties
as in:

W* = argmax_W ( log p(X|W) + \alpha \log p(W) + n(W)*\beta )

where \alpha is the scale, and \beta is the word insertion penalty,
and n(W) is number of words in W (so the more words we choose, the
more the penalty if we choose \beta to be negative).

Note that GMTK now also supports scale and offset. I.e., given a
probability p, under a particular switching condition, we can modify p
to be:

penalty*p^scale+shift

The syntax is:

weight:
nil
| penalty -100 scale -2 shift -5;
| scale -2
| shift 0:0;

and so on. The order of penalty scale shift doesn't matter. The
'int:int' notation means get the number from the global observation
matrix at that position at the frame number of the child (so this is
quite flexible, as any of penalty, scale, and shift can come from the
observation file at the frame of the child variable).

Here's what precisely penalty, scale, and shift do (as given by their
values in the .str file) in the log domain:

Penalty:
log(p) + penalty

Scale:

log(p) * scale

Shift:

so the scale directly multiplies the log prob. i.e., scale*log(p) so
scale can be positive or negative.

3) virtual evidence CPTs + Hybrid DBN/{ANN,SVM}
------------------------------------------------------

This allows GMTK to implement dynamic virtual evidence, so gives GMTK
the ability to do Hybrid ANN/HMM speech recognition (ANN = artificial
neural network), aka Morgan/Bourlard, and more generally
hyrbrid DBN/ANN and/or hybrid DBN/SVM systems.

A VirtualEvidenceCPT is just like a regular GMTK object. The master
file word is:

VE_CPT_IN_FILE

and like before, VECPTs can either be inline or contained in another
file. Also, you index and number the CPTs like all other GMTK objects.

So you have:

<num VECPTS>

0
<VECPT spec 0>

1
<VECPT spec 1>

and so on.

Each VECPT spec is:

<VECPT name>
<num parents> % must be == 1
<parent cardinality>
<self card> % must be == 2
<obsfile> % name of a file, just like the observation file on the command line, so it
% can be pfile, htk, ascii, or binary
<nflts>   % number floats
<nints>   % number ints
<f_rng>   % float range to use from within file
<i_rng>   % int range to use from within file
<pr_rs>   % per segment range. Not currently used, just use 'all' for now.
<fmt>     % file format (same as command line)
<swp>     % Endian swapping condition, T or F

There are more constraints. Either

A: nints == 0 and nflts (after applying f_rng) must be equal to
parent cardinality (so in this case, the obs file gives virtual
evidence for all possible parent values), or

B: nints == nflts (after applying f_rng and i_rng), and nflts <= parent_card,
so in this case, you only specify virtual evidence for a (not nec. strict) subset of the
parent values. In this case, the ints correspond to the parent value that the correspondig
float provides. I.e., if a frame was something like:
2.0 2.5 3.0 1.3 3.4 1 2 4 5 6
Then it would give VE of 2.0 for parent value 1, 2.5 for parent value 2, 3.0 for parent value 4, etc.
Note right now it makes no assumption about the order of the ints, so it has to do a dumb linear search.
I'll probably add the assumption that the ints are sorted and always do binary search, so if you use
this, make sure to keep the ints in sorted order.

Note that it assumes the unspecified parent values are just
0. Also, the VE values are actually the log base e (i.e., natural
log) values, so they can be positive or negative. The values
correspond to the case when the child is observed to be 1. If the
child is hidden, when the child is 0 then the value used is 1.0 -
exp(given_val) (i.e., it pretends that they are real probabilties
in this case even though they don't have to be).

The observation files are just like one of the observation files that
you can specify on the command line, i.e., pfile, ascii, binary,
htk. In the HTK, ascii, binari case, it is just a list of files. In
this case, you can't glue multiple such files together like you can on
the command line.

4) better and more informative error messages (most now include file line number location).
------------------------------------------------------------------------------------------

You'll start seeing it. The main thing is that parse errors files
containing DTs, CPTs, etc. will now tell you much more about the error.

5) better -debug values and much better and more informative verbose output with -verb 50 (or 60, 75 70, and 80)
--------------------------------------------------------------------

I.e., better use of -debug values during inference. 50, 60,65,70,80
for inference are the three levels of increasing verbosity + better
printing format. Also, the early debug messages are set to much higher
(90 or 100) so that the tracing is more optimized for end users who
want to debug their graph.

-verb 50 : print messages
-verb 60 : add printing clique value insertions
-verb 65 : add printing of separator values iterations
-verb 70 : add printing of all RV evaluations, as in Pr[word(3)=4|parents]=-3.4343
where 'parents' indicates there are parents, and the prob is ln(p)
-verb 80 : add parent values to above, as in:  Pr[word(3)=4|word(2)=4,wordTransition(2)=1]=-3.4343

This should make it *much* easier to debug your graph.

6) smarter gmtkTime with -multi option, no more thrashing when timing bad triangulations
----------------------------------------------------------------------------------

Basically, the problem before was that when you fed 'gmtkTime -multi' a
number of trifiles, the poor ones caused gmtkTime to use up all
available virtual memory before it completed even one partition, so
gmtkTime would essentially never return on those triangluations. The
only way to deal with this was to wait until it crashed and re-start it.

The new version is much smarter. Essentially it forks off a separate
process and limits the forked process to the amount of time given by the
user (plus a bit of slop). Therefore, no triangulation will ever take
more time then the user specified, and it will avoid the thrashing and
crashing problem.

Once we get this integrated into the scripts that search for new
triangulations, this should speed up that process significantly since
parameters. If you're not aware by now, Simon's find-triangulation
script found a triangulation on his graph that is 80x faster than
completed, and since completed is typically 3-4x faster than the old
version, this is about a 240x speedup over the old version!!

Simon documented his script at:

Also, I should mention that the triangulation that got the 80x speedup
ended up being from a new triangulation heuristic that Chris developed!!
(yeaaaah, sung to the tune of a Dean Scream :-)

Best,

-- Jeff

p.s. Apologies in advance if some of you recognize this email as an
extended compilation of emails I recently sent to you regarding
individual features :-) In particular, thanks much to Karen and
Gang for testing out (and finding bugs in) some of the new features
for me!

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Fri Sep 10 11:11:59 2004
Date: Fri, 10 Sep 2004 03:11:46 -0700
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@ssli-mail.ee.washington.edu
Subject: [Gmtk-users] dev tag updated (again)

User visible changes:
----------------------------------

gmtkJT now has options to print out clique contents after the collect/distribute evidence stage has completed:

-pCliquePrint <int_range> -cCliquePrint  <int_range> -eCliquePrint  <int_range>

These options apply either when doing the island algorithm or when doing
distribute evidence with gmtkJT. The options will select a set of
cliques (via the integer range) to print out (both clique entry
probabilities and RV values for that entry). With this option, it is
quite easy to get posteriors over any set or subset of variables from a
clique. For example, the aurora training graph trifile for the C
partition is defined as having 1 clique, as:

0-completed
1
0 10 wordCounter 1 word 1 wordPosition 1 wholeWordState 1 obs 1 phoneTransition 1 wordTransition 1 skipSil 2 wordCounter 2 wordPosition 2

This can be changed to have as many cliques as you want, as long as the
RIP still holds. E.g., to get the posteriors for the word transition,
change to:

0-completed
2
0 10 wordCounter 1 word 1 wordPosition 1 wholeWordState 1 obs 1 phoneTransition 1 wordTransition 1 skipSil 2 wordCounter 2 wordPosition 2
1 1 wordTransition 1

(note that the name of the triangulation doesn't matter) and give the
option: -cCliquePrint 1 which will output something like (assuming the
same is done for P):

Partition 0 (P), Clique 1: Printing Clique with 1 variables, 1 entries
0: 1.00000000e+00 wordTransition(0)=0
--------
Partition 1 (C), Clique 1: Printing Clique with 1 variables, 1 entries
0: 1.00000000e+00 wordTransition(1)=0
--------
Partition 2 (C), Clique 1: Printing Clique with 1 variables, 2 entries
0: 9.93243243e-01 wordTransition(2)=0
1: 6.75675676e-03 wordTransition(2)=1
--------
Partition 3 (C), Clique 1: Printing Clique with 1 variables, 2 entries
0: 9.83263080e-01 wordTransition(3)=0
1: 1.67369204e-02 wordTransition(3)=1
--------
Partition 4 (C), Clique 1: Printing Clique with 1 variables, 2 entries
0: 9.72415076e-01 wordTransition(4)=0
1: 2.75849244e-02 wordTransition(4)=1

Note that zeros are not printed. Also note, that the island version of
this will necessarily print the cliques in reverse time order (since
that is the order in which they are completed).

Internal changes (potential speedups):
------------------

- Distribute Evidence pruning (or what could be called DE zero compression)
is now working, and is done automatically. This potentially removes
lots of zeros, and in some cases will significantly speedup EM
training (particularly when there are lots of Gaussian
components). Note that this does not affect gmtkViterbiNew or 'gmtkJT
-probE'. But it will effect gmtkEMTrainNew, and 'gmtkJT -doDist' or
'gmtkJT -island T'

- JT final E root clique is now pruned using same -cbeam, -ckbeam, etc. options. Before,
the root clique was not pruned.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Sat Oct  2 03:31:56 2004
Date: Fri, 01 Oct 2004 19:31:46 -0700
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] compiling

A few of you might have noticed a bit of trouble compiling the latest
version where it has problems with a bit of the following code:

#ifdef PIPE_ASCII_FILES_THROUGH_CPP
#ifndef DECLARE_POPEN_FUNCTIONS_EXTERN_C
extern "C" {
FILE     *popen(const char *, const char *) __THROW;
int pclose(FILE *stream) __THROW;
};
#endif
#endif

Basically, this has to do with cygwin, as cygwin needs the extra bit of code. If
you're having trouble, try:

make EXCFLAGS="-DDECLARE_POPEN_FUNCTIONS_EXTERN_C"

for now. Note that once the Makefiles and compiling strategy is redone
(which hopefully will be soon), this problem will go away.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From gang@ssli-mail.ee.washington.edu Wed Oct  6 23:35:28 2004
Date: Wed, 06 Oct 2004 15:35:22 -0700
From: Gang Ji <gang@ssli-mail.ee.washington.edu>
To: gmtk-users@ssli.ee.washington.edu
Subject: [Gmtk-users] fngram with fewer parents

Hi all,

I have updated GMTK development tag.  Now factored ngram with fewer
parents is now supported.

For an example, if you have a factored language model for P(w_t,
w_{t-1}, w_{t-2}, pos_t), you can use FLM in this case.  There is no
change in the master file.

flm % name
3 % number of parents
VOCAB_SIZE VOCAB_SIZE POS_SIZE VOCAB_SIZE
your_flm_file W-vocab_for_word:T-vorab_for_tag

In the structure file, you can say

conditionalparents: word(-1), word(-2), tag(0) using FNGramCPT("flm")

if you want to use all the parents and

conditionalparents: word(-1), tag(0) using FNGramCPT("flm:0,2")

if you only want to use word(-1) and tag(0).

Note that the indices ("0,2" in this case) must be compatible with the
factored language model specification.

Best,
Gang

--
Office:   (206) 221-5216
Fax:      (206) 543-3842
email:    gang@ee.washington.edu
homepage: welcome.to/rainier

_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From gang@ssli-mail.ee.washington.edu Tue Oct 12 02:03:07 2004
Date: Mon, 11 Oct 2004 18:03:00 -0700
From: Gang Ji <gang@ssli-mail.ee.washington.edu>
To: gmtk-users@ssli.ee.washington.edu
Subject: [Gmtk-users] fngram syntax change

Hi all,

Following Jeff's suggestion, there is a syntax change for factored
ngrams with fewer parents.  In structure file, the user should say

using FNGramCPT("fngram", 0, 2)

if the user wants to use factored language model "fngram" and use only
the first (0) and third(2) parent specified in the FLM file.  This
syntax is more consistant with the rest of the syntax in GMTK.

The new code was checked in and dev-tagged.

Best,
Gang

--
Office:   (206) 221-5216
Fax:      (206) 543-3842
email:    gang@ee.washington.edu
homepage: welcome.to/rainier

_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Sat Nov  6 17:31:06 2004
Date: Fri, 05 Nov 2004 12:17:42 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] stuff

Hi Everyone,

There hasn't been much traffic on this list for a while, but please
don't think that there aren't major gmtk improvements planned (I've just
been really busy). More significant speedups are on the way!!

I do, however, want to remind everyone that Evan has completed a fairly
stable version of gmtkViz, the gmtk visualization tool that already is
quite useful. If you check out the latest version, you can type 'make
gmtkViz' in the tksrc directory to compile it (at least on ssli/linux
for now).

Here is a quick list of non-obvious features:

1) the program is not ment for creating graphs, rather it is for creating
a nice picture of your graph once you have edited the .str file. It will
create a position file for each structure file that saves the position
of each node.
2) The position editing is quite good, its sort of a combination
of some of the features in microsoft Powerpoint and Adobe Illustrator. It
will display the three partitions (P,C,E) and any frames within those partitions.
3) you can layout one frame/partition, and then copy the layout of one frame/partition
to another frame/partition. This feature is really crucial, as it allows you to
quickly layout even quite complicated graphs.
4) the program can save to postscript, for inclusion in latex/msword documents.
5) It has lots of options for changing edge/node color, arrow display, control-point size, etc.
Edges are rendered as splines with control points added to control curvature.

A few non-obvious features (from Evan):
a) Pushing the delete key deletes any selected control points on
an edge.
b) Right-clicking a control point adds a control point to an edge.
c) If a line has no control points (except its endpoints), clicking on the

I've already used it a few times and found it a great tool to display a graph
you want to explain to someone. Also, its a good way to check that your graph
does what you want.

Please try it out, and if you find bugs, please send them to gmtk-bugs
(but note that Evan is now gone, so the bugs won't be fixed until we can
find a replacement).

A couple of known problems:
1) if you read in a graph that doesn't parse, the entire program will die.
2) sometimes the program crashes for no apparent reason (not as bad as it sounds).

Therefore, save your work OFTEN when using it!

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@cuba.ee.washington.edu Mon Nov 15 18:37:32 2004
Date: Mon, 15 Nov 2004 10:27:52 -0800
From: Jeff Bilmes <bilmes@cuba.ee.washington.edu>
To: Joe Frankel <joe@cstr.ed.ac.uk>
Subject: Re: virtual evidence CPTs

Yes, anything less than -1.0E10 should work.

-- Jeff

In the message dated Mon, 15 Nov 2004 13:11:00 GMT
Joe Frankel <joe@cstr.ed.ac.uk> writes:
>
>Hi there - hope all is well.
>
>I am currently implementing a hybrid DBN/ANN system, and was wondering
>what value for log of zero you use in GMTK. I need to convert my ANN
>posteriors (which are between 0 and 1) into the log domain, and therefore
>have to decide what value to assign to zero probabilities.
>
>thanks for any help,
>
>Joe.
>
>
>
>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>Joe Frankel
>         Centre for Speech Technology,
>         University of Edinburgh,
>         2 Buccleuch Place, EH8 9LW.
>         Tel: +44 131 651 1769
>         Fax: +44 131 650 4587
>
>E-mail: joe@cstr.ed.ac.uk
>http://www.cstr.ed.ac.uk/~joe
>
>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+

From bilmes@cuba.ee.washington.edu Fri Nov 19 19:55:27 2004
Date: Fri, 19 Nov 2004 11:55:18 -0800
From: Jeff Bilmes <bilmes@cuba.ee.washington.edu>
To: Simon.King@ed.ac.uk
Cc: Joe Frankel <joe@cstr.ed.ac.uk>
Subject: Re: [Gmtk-users] GMTK dev tag update, major new features

In the message dated Tue, 16 Nov 2004 15:42:49 GMT
Simon King <Simon.King@ed.ac.uk> writes:
>Hi Jeff,
>
>we are making our first attempts at using virtual evidence
>
>> 3) virtual evidence CPTs + Hybrid DBN/{ANN,SVM}
>> ------------------------------------------------------
>
>And our model uses A:
>
>>      A: nints == 0 and nflts (after applying f_rng) must be equal to
>>      parent cardinality (so in this case, the obs file gives virtual
>>      evidence for all possible parent values), or
>
>i.e. we always have a complete set of virtual evidence, every frame, for
>all possible parent values

Ok, sounds good.

>
>A few questions:
>
>virtual evidence is just a PDF (right?) giving information about a
>hidden parent RV that it is connected to, e.g.

Actually, it really doesn't need to be a PDF (the TR I sent you talks
can use any values you want).

>
>
>P = hidden RV, cardinality 5
>V = virtual evidence node, reading in 5 values per frame
>P is a parent of V
>
>P----->V
>
>Now....in the above case, the pdf of P is always simply read off from V
>and doesn't depend on the values of any parents or other children of P -
>correct?

Its not the pdf of P. Rather, P still has its own PDF (i.e., Pr(P=p)),
and it gets additional evidence values from the P(V=1|P=p) which can
be thought of as a function of p.

>
>So....if we have some complex graph, on to which we want to add some
>virtual evidence, then we cannot simply attach V to an existing parent
>P, we must introduce an intermediate parent Q, and attach it thus:
>
>P----->Q----->V
>
>Where P is connected to all sorts of other variables, but Q is only
>connected to P and V. Are we correct? And the way Q depends on P is

This might be a thing you want to do, but it depends on the application
(i.e., Q might be clusters of P).

>any hints?

If Q is a copy-parent of P, then there is no reason to have Q.

>
>
>Finally - normalisation: our virtual evidence is the output of a neural
>net (which was trained to do 1-of-N classification). Should we be
>normalising within a frame s.t. the outputs sum to  1 (i.e. as we would
>for a hybrid HMM/ANN system) , OR do we normalise per output (i.e. per

You don't need to normalize if you don't want to. In the hybrid scheme,
they divide by priors (this is also mentioned in the TR).

>parent value) s.t. the range of numbers (e.g. across a sentence or data
>set) is 0->1 ?

Actually no, you don't need to do this normalization. Again, see the TR
which I think explains it.

I'll try to finish the TR with figures and everything soon which I think
will make it much clearer. Also, if you've got comments on the TR, please
let me know!!

Hope this helps!

-- Jeff

>
>
>Simon
>
>--
>Dr. Simon King                               Simon.King@ed.ac.uk
>Centre for Speech Technology Research          www.cstr.ed.ac.uk
>For MSc/PhD info, visit  www.hcrc.ed.ac.uk/language-at-edinburgh

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+

From gang@ssli-mail.ee.washington.edu Wed Jan 19 00:55:25 2005
Date: Tue, 18 Jan 2005 16:55:17 -0800
From: Gang Ji <gang@ssli-mail.ee.washington.edu>
To: gmtk-users@ssli.ee.washington.edu
Subject: [Gmtk-users] dev tag updated

Hi GMTKer's,

Bug fixes: there was a bug when two or more iterators for one
NGramCPT/FNGramCPT exist.  This bug was fixed in this version.

*** how to create vocab object ***

% in master file
VOCAB_IN_FILE inline
1 % number of vocabs
0 % index
wordVocab % name
27682 % cardinality
w.dct % file for word list

*** how to use ngram cpt ***

% in master file
NGRAM_CPT_IN_FILE inline
1 % number of ngrams
0 % index
ngram % name
2 % number of parents
27682 27682 27682 % cardinalities
bigtrigram.lm wordVocab % ARPA lm file name followed by vocab object name

% in structure file
conditionalparents: word(-2), word(-1) using NGramCPT("ngram");
conditionalparents: word(-1) using NGramCPT("ngram"); % also supports
less parents

*** how to use fngram cpt ***

% in master file
FNGRAM_CPT_IN_FILE inline
1 % number of fngrams
0 % index
fngram % name
3 % number of parents
27680 27680 25 27680 % cardinalities
meeting.flm W-wordVocab:P-posVocab % FLM filename followed by vocab mapping

% in structure file
conditionalparents: word(-2), word(-1), pos(0) using FNGramCPT("fngram");
conditionalparents: word(-2), word(-1) using FNGramCPT("fngram", 0, 1);
% this means uing 0th and 1st parent in FLM specification

Best,
Gang

--
Office:   (206) 221-5216
Fax:      (206) 543-3842
email:    gang@ee.washington.edu
homepage: welcome.to/rainier

_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Tue Jan 11 13:52:00 2005
Date: Tue, 11 Jan 2005 05:51:47 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] dev tag updated

Major changes:

- GMTK now supports arbitrary disconnected networks, where the network
can either be disconnected within a partition, or across partitions.
Also, we may now have empty P and/or E partitions, and disconnected
C partitions (so if only a C is given, this can act as a static
network that gets replicated as the observations get longer). A static
now either be just a C (without having a P or an E), or can be
a E'=CE, and unrolling by zero.

- As a result of GMTK supporting disconnected networks, triangulations
are now formed from UGMs that do not necessarily have observed
parents connected to their children. This might significantly speedup
some graphs where you have observed variables that are both children
and parents.

-  Unfortunately, I changed the trifile format, since old format graph-ID had a bug,
namely it didn't store the chunk information, nor the end of the graph (so
if you truncated frames in a graph that might have caused problems). We now use a special
string at the end of the id. The tri-file information is still valid, so you can
use the same trifiles after a quick by-hand edit.

To go from the old to the new format, you just need to add the chunk
frames (two integers) and the special string
"@@@!!!TRIFILE_END_OF_ID_STRING!!!@@@" at the end of the graph id.

In other words, the beginning of every trifile contains a condensed version
of the .str file that is used to ensure that the trifile belongs with the .str file.
Here is an example of the beginning of one such file for auroraTutorial, in old format:

===========================================================
% Structure File Identification Information
0 skipSil 0 2 D 0 1 0
1 wordCounter 0 22 D 0 1 1 skipSil 0
2 word 0 13 D 0 1 1 wordCounter 0
3 wordPosition 0 8 D 0 1 0
4 wholeWordState 0 91 D 0 1 2 word 0 wordPosition 0
5 obs 0 91 C 0 1 1 wholeWordState 0
6 phoneTransition 0 2 D 0 1 1 wholeWordState 0
7 wordTransition 0 2 D 0 1 3 word 0 wordPosition 0 phoneTransition 0
8 skipSil 1 2 D 0 1 0
9 wordCounter 1 22 D 0 1 3 wordCounter -1 wordTransition -1 skipSil 0
10 word 1 13 D 0 1 1 wordCounter 0
11 wordPosition 1 8 D 0 1 3 wordTransition -1 phoneTransition -1 wordPosition -1
12 wholeWordState 1 91 D 0 1 2 word 0 wordPosition 0
13 obs 1 91 C 0 1 1 wholeWordState 0
14 phoneTransition 1 2 D 0 1 1 wholeWordState 0
15 wordTransition 1 2 D 0 1 3 word 0 wordPosition 0 phoneTransition 0
16 skipSil 2 2 D 0 1 0
17 wordCounter 2 22 D 0 1 3 wordCounter -1 wordTransition -1 skipSil 0
18 word 2 13 D 0 1 1 wordCounter 0
19 wordPosition 2 8 D 0 1 3 wordTransition -1 phoneTransition -1 wordPosition -1
20 wholeWordState 2 91 D 0 1 2 word 0 wordPosition 0
21 obs 2 91 C 0 1 1 wholeWordState 0
22 phoneTransition 2 2 D 0 1 1 wholeWordState 0
23 wordTransition 2 2 D 0 1 3 word 0 wordPosition 0 phoneTransition 0
24 endOfUtteranceObservation 2 2 D 0 1 2 wordTransition 0 wordCounter 0

...<stuff>...

===========================================================

where <stuff> is the rest of the trifile.  The new file format for
the trifile must have the chunk numbers and the end string at the end
of the ID section. I.e., if the .str file had "chunk 1:1" at the end,
the above id would turn into:

===========================================================
% Structure File Identification Information
0 skipSil 0 2 D 0 1 0
1 wordCounter 0 22 D 0 1 1 skipSil 0
2 word 0 13 D 0 1 1 wordCounter 0
3 wordPosition 0 8 D 0 1 0
4 wholeWordState 0 91 D 0 1 2 word 0 wordPosition 0
5 obs 0 91 C 0 1 1 wholeWordState 0
6 phoneTransition 0 2 D 0 1 1 wholeWordState 0
7 wordTransition 0 2 D 0 1 3 word 0 wordPosition 0 phoneTransition 0
8 skipSil 1 2 D 0 1 0
9 wordCounter 1 22 D 0 1 3 wordCounter -1 wordTransition -1 skipSil 0
10 word 1 13 D 0 1 1 wordCounter 0
11 wordPosition 1 8 D 0 1 3 wordTransition -1 phoneTransition -1 wordPosition -1
12 wholeWordState 1 91 D 0 1 2 word 0 wordPosition 0
13 obs 1 91 C 0 1 1 wholeWordState 0
14 phoneTransition 1 2 D 0 1 1 wholeWordState 0
15 wordTransition 1 2 D 0 1 3 word 0 wordPosition 0 phoneTransition 0
16 skipSil 2 2 D 0 1 0
17 wordCounter 2 22 D 0 1 3 wordCounter -1 wordTransition -1 skipSil 0
18 word 2 13 D 0 1 1 wordCounter 0
19 wordPosition 2 8 D 0 1 3 wordTransition -1 phoneTransition -1 wordPosition -1
20 wholeWordState 2 91 D 0 1 2 word 0 wordPosition 0
21 obs 2 91 C 0 1 1 wholeWordState 0
22 phoneTransition 2 2 D 0 1 1 wholeWordState 0
23 wordTransition 2 2 D 0 1 3 word 0 wordPosition 0 phoneTransition 0
24 endOfUtteranceObservation 2 2 D 0 1 2 wordTransition 0 wordCounter 0
1 1
@@@!!!TRIFILE_END_OF_ID_STRING!!!@@@

...<same stuff afterwards> ...

===========================================================

- For graph debugging: Early print warnings as soon as we start getting zeros, either for cliques,
or for separators (in latter case if any even minor degree of sep pruning is on).
This is enabled with '-verb 40', so you'll see exactly the frame where
you start getting zeros.

- VECPT syntax change again (this will be the last one). This is on top
of Karim's recent change. The VECPT now has the following format; The
first set of options match that of any other CPT. Namely, we have
1) a name,
2) num parents (which must be 1 in this case),
3) parent cardinality
4) self cardinaltiy (which must be 2 in this case).
5) file observation name for the VECPT

Next, we have a set of optional arguments that the user may give
to the observation code. If an optional argument is not given,
then the default value may be used.  These optional arguments
consist of lines of a "flag : value" syntax, where "flag"
indicates the current argument, and "value" is its value.
The VECPT and optional arguments *MUST* end with the string "END"

Here is an example:

0
VECPT0  % name of VECPT
1 % num par
2 % par card
2 % self card
VECPT0_FILE % file to read in.
nfs:2 % nfloats
nis:0 % nints
frs:all % float range
irs:all % int range
pr:all % must be all
fmt:ascii
swap:F % endian swapping condition
preTransforms:X
postTransforms:X
sentRange:all
END

but many are optional, so this is the same as:

0
VECPT0  % name of VECPT
1 % num par
2 % par card
2 % self card
VECPT0_FILE % file to read in.
nfs:2 % nfloats
nis:0 % nints
fmt:ascii
swap:F % endian swapping condition
END

The shortest would be:

0
VECPT0  % name of VECPT
1 % num par
2 % par card
2 % self card
VECPT0_FILE % file to read in.
END

Minor changes:

-  verbose messages now working for reading parameters. Try -debug 90 and you'll
see what parameters are being read in.

-  Names P1, Co, E1 are now printed P',C',E', where for left interface
P'=P, C'=C, E'=[CE], and for right interface, P'=[PC],C'=C,E'=E. -verb 80 output
further prettied.

-  Removed almost all of the old and unused code from the CVS distribution.

Bug fixes

- various bug fixes, including a few regarding switching parents and EM training.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Fri Dec 24 17:10:38 2004
Date: Fri, 24 Dec 2004 09:10:27 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Cc: mukundn@seagull.ee.washington.edu
Subject: [Gmtk-users] GMTK dev tag updated

Major changes:

**  well-defined template restrictions. Now insists on:

In the left interface case (with M=S=1), we must have:

[P | C E] == [P | C C E] == [P C | C E]

What this means is this: The vertical bar '|' cuts the graph (via edges)
into a left and a right portion.  In the left interface case, the nodes
on the right of the edge cut must be the same (relatively) with respect
to each other (i.e., they are the same nodes, by at most a shift by
S*numframesin(C) frames).  Note that when P is empty, we don't need to
check the interface at all since we are guaranteed E does not reach to
the left beyond one C, so there is only one total interface, The C-C
interface, in C | C E.

In the right interface case (with M=S=1), we must have:

[P C | E] == [P C C | E] == [P C | C E]

Here, we check that the nodes on the *left* of the cut are relatively
the same.

The above is now the only graphical restriction paced on the
template (other than the obvious no directed cycles).  This is done
so that the boundary provided by boundary algorithm can be validly
used to compute the interface for the P-C boundary, the C-C
boundary, and the C-E boundary.

As an example, we can now have a variable in E ask for a parent in,
say, P, if the above restrictions are followed (which might means
that C has to ask for the same variable in P, and that the variable
in P being asked for also exists in C).

Just in case, here is a complete picture of the above, for arbitrary M and S.
Left interface:
P | C(1) C(2) ... C(M) E
P | C(1)  C(2)   ...     C(M+S) E
P  C(1) ...  C(S) | C(S+1) ... C(M+S) E
Right interface:
P C(1) C(2) ... C(M) | E
P C(1)  C(2)   ...     C(M+S) | E
P C(1) ...  C(S) | C(S+1) ... C(M+S) E

Therefore, you can increase M and/or S to relax further the restriction above
(but at the cost of more restricted length T, see below).

** Now officially supports higher-order Markov chains (i.e. variables
with parents -2, +2, -3, +3, N-chunk-crossing, N>1, etc. as long as
the template abides by the restrictions mentioned above). You will
need to retriangulate to get this working (i.e., the trifile is what
the problem was before).

Please test this out and let me know if you find any problems!!!

** As a consequence, we can now use the case of unrolling by zero (i.e.,
using the basic .str template). From the basic user template partitions of
P, C, and E with frame lengths p, c, and e, this says that we may
process T frames, where
T = p + (M+kS)*c + e
where k>=0 (before the requirement was k>=1).

This also means that GMTK can be used as a static network inference. I.e.,
just make P empty, put your graph in C E, do a quick triangulation of C' (say
completed), and do a careful triangulation of E'.

(note: soon, we'll also be supporting disconnected networks so that you can
put the static network in C as an option, and have neither a P nor an E, right
now you need at least one of P and C).

Also note: as a consequence, on positive graphs, you'll also see a space
reduction by about one half on the C storage. On mixed stochastic/deterministic
graphs you might see a small space saving.

Minor changes:

** More/updated comments and error messages

** Compiles again on cygwin. Should compile on both cygwin and linux
without code change (hopefully:)

** gmtkTime supports a '-times' argument in non-multitest mode, to
run a timing multiple times.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Tue Jan  4 07:41:54 2005
Date: Mon, 03 Jan 2005 23:41:44 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] dev tag update

Folks,

Two new GMTK dev-tag updates of mentionable worth are now in.

Summary:

1) Re-written Decision Trees: new DTs are now much faster and take less memory.
Also, depending on how you write your DT, things can get faster still.

2) Virtual Evidence Separators: Can speed up exact inference when you've got
a stochastic graph with some constraints.

Extended:
============================================================================
============================================================================
1) Re-written Decision Trees

At long last, I've re-done the code for the dreaded decision tree
non-leaf nodes.

Interestingly, I've found that this change gives anywhere from a 3 to 30% inference
speedup on the graphs (depending on how much and how determinism is
specified).  They also use significantly less memory, and take much less
time to load in (for some DTs, reading time goes from 1/2-hour to
seconds). It should now be possible to use some of those big LVCSR DTs
that people have wanted to use.

Basically, the reason for the speedup is better memory management and
implementation of DT queries. The new version also supports a new syntax
for DTs which is much faster (especially to read).  All old DTs should
still work as is however (and with a speedup). Basically, whenever you
have a construct like:

0 16 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 default
-1 50
-1 39
etc.

This *ascending* sequential consecutive order of ints will be fast since it will use
direct indexing. Note that the order right now needs to be both
*ascending*, *consecutive* and *integer* (no individual ranges) to get
direct indexing. In fact, there is now a short cut for this that you can
use:

0 16 0 ... 14 default
-1 50
-1 39
etc.

If you've got any long consecutive strings of numbers, (and some DTs
I've seen have many 10s of thousands of such numbers), this will load in
faster and take *significantly* less memory. Also, the first position
can be anything, as long as the rest are ascending and consecutive.
I.e., you can do:

0 16 5 ... 19 default
-1 50
-1 39
etc.

which will have the same speed.

Put another way, it will be beneficial to whenever possible express your
DTs in this way. If the sets are all integers but the order is not
ascending or they ints are not sequential, you'll still get a speedup
since now a hash table is used to do the mapping (rather than
the old way of binary searches with ranges).

In other words, this:

0 4 0 1 2 default
-1 50
-1 39
etc.

will be the same speed as this:

0 4 0 ... 2 default
-1 50
-1 39
etc.

which will be faster than this (uses a hash table):

0 4 1 0 2 default
-1 39
-1 50
etc.

which will be faster than this (which uses int range objects):

0 4 1:1 0:0 2:0 default
-1 39
-1 50
etc.

I.e., the first two use direct mappings, the 2nd uses a hash table, and
the 3rd case uses the more general range objects.

Indeed, more smarts can be done to do things like detect when a list of
ranges is really a ascending consecutive list, or when a hash could be
the same. Perhaps someday this will be done, but for now at least if you
know what you are doing you can get the speedup and memory savings
(which can be substantial. On some LVCSR graphs, the old DTs alone were

============================================================================
============================================================================
2) Virtual Evidence Separators.

GMTK now supports "Virtual Evidence Separators". Basically, whenever
GMTK finds a case of either:

case 1) an child observed = 1 and deterministically related to parents
(and if there is no switching, iterable-ness, and if the child
is constant immediate observed, meaning the observation value
has to come from the str file, and can't be any of the
"variable" keywords such as "frame", etc.). I.e.,

P1 P2 ... PN
\ \  / /
C=1

case 2) a grandchild observed = 1, randomly related to a child (the
parent of grandchild), and if the child is deterministically
related to parents (and otherwise the same conditions as (1)
hold). I.e.,

P1 P2 ... PN
\ \  / /
C
|
v
G=1

Note that C = f(P_{1:N}) (i.e., deterministic), and G = random(C) but
random() can't be an iterable function (one that changes from utterance
to utterance) nor can random() be a VECPT (which is essentially iterable
since it changes from frame to frame).

Then it GMTK can optionally use VE separators to deal with the implicit
constraints imposed on the parents by the (case 1) child or (case 2)
grandchild.

To do this, GMTK offline iterates through all parent values and builds
a table of which parents satisfy the child (or grant child) with non-zero
probability. Then when iterating through the variables, it iterates
through all satisfying parents simultaneously (since the are dependent
conditioned on the child (or grandchild)) rather than separately
like in the old version.

This set of parent values is either computed anew once each time GMTK is
run, or is additionally optionally stored on disk so that next time GMTK
runs it will load rather than re-generate the tables.  The reason for
the disk file is that when there are many parent values that need to be
checked, computing this table can take a long time.

Note you are not guaranteed that your graphs have any VE seps in them
(it might only benefit for some graphs), and even if your graph is
"VE-seperable", speedups aren't guaranteed. In some cases I've seen a
speedup but even then it wasn't big. The speedup you get I think will be
significantly influenced by the triangulation, so soon to VEsep or not
will be integrated into the triangulation search step.

The default behavior of GMTK currently has veseps turned off.  The
relevant programs options are:

-useVESeparators
Use Virtual Evidence (VE) Separators (if any are available) during inference.
The option is "-useVESeparators n", where n = 0x0, 0x1, 0x2, or 0x3

The option is a 2-bit bitmask on what kind of VE sep to use. 0x1
corresponds to the direct child (parents and child), while 0x2
corresponds to parents, a deterministic child, and a random
grandchild (see above)

-veSepWhere
Where to use VE seps. Bitwise or of 0x1 (P), 0x2 (C), 0x4 (E)
Usage is "-veSepwhere n" where n = 0x0 ... 0x7, and the
argument is a 3-bit bitmask saying where (if any are available) the veseps
should be used. 0x1 is P, 0x2 is in C, and 0x4 is in E.

If either -useVESeparators or -veSepWhere is 0, you turn off VEseps.

-veSepFileName
Name of VE separators file to store VE sep/read previous VE sep info.
There is a default file name, and if you use ve seps it will right now
always write to this file.

-veSepRecompute
Force a re-compute of VE separator information. This is necessary when
you change some of the options above. I.e., if you decide to use VEseps
in E, you'll need to recompute the file (otherwise you'll get an error
message saying that the vesep file doesn't match the current configuration)

-veSepLogProdCardLimit
The log (base 10) upper limit on a VE sep variable cardinality product.
As mentioned above, sometimes generating the VEsep tables can take a while.
This option limits VE seps to be only those where the (base 10) log of the product of the
cardinalities of the parents is below some threshold, and the threshold
is given by this arg.

Note that VEtable computation can be significantly sped up by using a
separate structure file to use just one time for table generatio. In this
structure, some of the parents should be observed to the values
that you know are guaranteed to be the ones that satisfy the child (i.e.,
if all satisfying parents have say the third parent == 5, the special
structure file should have that parent observed to be 5).

In still other words, Lets say you're in case 1 above. If there are
any parents that have only one possible value that (along with other
parent values) will explain C=1, then make those parents observed in
this extra .str file. That way the exact same set of constraints will
be generated in the table.

Once the tables are generated and stored to disk, go back to the old
str file.

Currently, a RV child that is all deterministic and uses switching will
not constitute a VEsep, but that will be added in a couple of weeks.

Best,

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Tue Jan  4 07:58:52 2005
Date: Mon, 03 Jan 2005 23:56:41 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] another change dev tag update

Oh, I forgot to mention that the debugging output has been cleaned up
and more information is displayed in a more concise way. I.e., options
of interest are:

-verb 50  - just shows message passings in JT.
-verb 60  - shows above + clique insertions
-verb 65  - show above + iteration starts (including separator, unassigned, & assigned starts).
-verb 70  - show above + all iterations, parent values summarized
-verb 80  - show above but with parent values explicitly shown.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From karim@cs.washington.edu Thu Jan  6 22:47:08 2005
Date: Thu, 6 Jan 2005 14:46:51 -0800 (PST)
From: Karim Filali <karim@cs.washington.edu>
To: gmtk-users@ssli.ee.washington.edu
Cc: Karim Filali <karim@cs.washington.edu>
Subject: [Gmtk-users] "New" GMTK observation file options summary

Summary of the "new" observation options in GMTK:
------------------------------------------------

--------------------
a) [-fdiffactX <str>] Automatically adjust segment lengths:

The following options are supported to adjust the length of
segments so that they match across all supplied observation files:

<str> =
er : report an error (the default)
rl : repeat the last frame of shorter segments to match the
length of the longest corresponding segment across all
observation input files
rf : repeat the first frame
se : expand segmentally i.e. repeat each frame of the shorter
segments uniformely to match the length of the longest one
ts : truncate longer segments from the beginning so that their
length matches the length of the shortest corresponding segment
across all observation input files
te : truncate from the end

b) [-sdiffactX <str>] Automatically adjust file lengths:

The following options are supported to adjust the length of
input files so that they are the same:

<str> =
er : report an error
te : truncate longer files from the end (the default)
rl : repeat the last segment of shorter files
wa : wrap around i.e. shorter files will cycle back to the
beginning when a segment number larger than the size of the file
is requested.

These length adjustments take effect after all per-file transformations
are applied and before any global transformation or feature
combination is applied.

II) Transformations:
------------------

The following transformations are supported.  They are listed in the
order in which they take effect:

a) [-preprX str] Per-stream ("stream" and "file" are used
interchangeably.) frame range: select a range of frames in each
segment of observation file number X (to achieve downsampling for
example or remove parts of the data)

b) [-transX str] Per-stream transformations string.  They are applied
to float features only.

A transformation string has the following format

TRANS1[<float>|<int>|<@filename>][_TRANS2[<float>|<int>|<@filename>]][...]

where TRANSN can take the following values

X :        empty transformation.  Don't do anything.
O<float>:  add a constant offset to all features.  Example: O3.4
M<float>:  multiply by a constant.  Example: M12.5
N:         perform mean and variance normalization at the segment level
E:         perform mean substraction only
F<@filename>: Apply the filter in filename.  The filter is a vector
of floats.  Example F@"filter_file"
UH<int>:   upsample-hold by <int>.  Example: UH2 repeats each frame twice
US<int>:   upsample-smooth by <int>.  Example: US1 repeats each frame once
and performs smoothing
R<int>:    applies an ARMA filter of order <int>.  Example: R3

c) [-postprX str] Per-stream frame range after the -transX
transformations are applied

--->  Length adjustments go here  <---

--->  Feature combinations (see below) go here <---

d) [-posttrans str] Final transformation string.  Same format as the
per-stream transformation except it is applied over the global
observation matrix.

e) [-gpr str]  Global final frame range

III) Feature combinations
--------------------------
TWO streams can be combined feature-wise using the option
[-comb str].  Only float features are affected.
(sub), multiplication (mul), and division (div).

WARNING: The behavior of this feature when more than two streams are
combined is not well defined.  Same thing for division by zero.

The default is to allow the combination of two streams that have a
different number of float features.  The shorter stream is padded with
zeros.  This can be changed by defining
ALLOW_VARIABLE_DIM_COMBINED_STREAMS to be 0 in
GMTK_ObservationMatrix.h

--------------------------------------------------------------------

Examples:
---------

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2

0 0 0.000000 1.100000 2.200000 1 0
0 1 1.000000 10.100000 23.200001 1 1
0 2 2.000000 100.099998 19.200001 6 7
1 0 33.000000 1.100000 -7.000000 10 4
1 1 -1.000000 -2.100000 23.200001 2 2
2 0 400.000000 71.099998 98.000000 10 4
2 1 -1.000000 0.100000 -56.200001 66 100
2 2 0.040000 0.031000 19.200001 0 9

> obs-print -i1 file2 -ifmt1 ascii -nf1 2 -ni1 2

0 0 555.000000 5763.986816 6 7
0 1 -1.000000 0.100000 334 87
0 2 -0.003400 17.000000 99 56
0 3 -986.500000 88.000000 23 8
1 0 6.000000 100.900002 22 8
1 1 0.900000 0.100000 8 7
1 2 10.900000 9.000000 1 8

*** Unequal segment lengths ***

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2
ascii -nf2 2 -ni2 2

ERROR: The number of frames for segment 0 is not the same for
observation files 'file1' and 'file2' (3 vs. 4).  Use the -fdiff
option.

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2
ascii -nf2 2 -ni2 2 -fdiff1 rl

0 0 0.000000 1.100000 2.200000 555.000000 5763.986816 1 0 6 7
0 1 1.000000 10.100000 23.200001 -1.000000 0.100000 1 1 334 87
0 2 2.000000 100.099998 19.200001 -0.003400 17.000000 6 7 99 56
0 3 2.000000 100.099998 19.200001 -986.500000 88.000000 6 7 23 8
1 0 33.000000 1.100000 -7.000000 6.000000 100.900002 10 4 22 8
1 1 -1.000000 -2.100000 23.200001 0.900000 0.100000 2 2 8 7
1 2 -1.000000 -2.100000 23.200001 10.900000 9.000000 2 2 1 8

*** Unequal file lengths  ***

By default the longer file is truncated (see above:  result has 2
segments but file1 has 3)

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2
ascii -nf2 2 -ni2 2 -fdiff1 rl -sdiff1 wa -sdiff2 wa

0 0 0.000000 1.100000 2.200000 555.000000 5763.986816 1 0 6 7
0 1 1.000000 10.100000 23.200001 -1.000000 0.100000 1 1 334 87
0 2 2.000000 100.099998 19.200001 -0.003400 17.000000 6 7 99 56
0 3 2.000000 100.099998 19.200001 -986.500000 88.000000 6 7 23 8
1 0 33.000000 1.100000 -7.000000 6.000000 100.900002 10 4 22 8
1 1 -1.000000 -2.100000 23.200001 0.900000 0.100000 2 2 8 7
1 2 -1.000000 -2.100000 23.200001 10.900000 9.000000 2 2 1 8
2 0 400.000000 71.099998 98.000000 555.000000 5763.986816 10 4 6 7
2 1 -1.000000 0.100000 -56.200001 -1.000000 0.100000 66 100 334 87
2 2 0.040000 0.031000 19.200001 -0.003400 17.000000 0 9 99 56
2 3 0.040000 0.031000 19.200001 -986.500000 88.000000 0 9 23 8

*** Transformations ***

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2 ascii
-nf2 2 -ni2 2 -fdiff1 rl -sdiff1 wa -sdiff2 wa -pre1 0:1 -trans2
"N_M2.7_O5_F@filter_US1" -postpr2 0:3 -posttrans N_O10 -gpr 0:1

0 0 8.500000 8.500000 8.500000 10.603625 11.139734 1 0 6 7
0 1 10.500000 10.500000 10.500000 10.498142 10.397820 1 1 6 7
1 0 11.500000 11.500000 8.500000 10.845489 11.307186 10 4 22 8
1 1 9.500000 9.500000 10.500000 10.476808 10.257824 2 2 22 8
2 0 11.500000 11.500000 11.500000 10.603625 11.139734 10 4 6 7
2 1 9.500000 9.500000 9.500000 10.498142 10.397820 66 100 6 7

where the file file "filter" contains the line
1.2 2.4 1.2

*** Combinations  ***

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2
ascii -nf2 2 -ni2 2 -fdiff1 rl -sdiff1 wa -sdiff2 wa -pre1 0:1
-trans2 "N_M2.7_O5_F@filter_US1" -postpr2 0:3 -posttrans N_O10 -gpr

0 0 10.494615 10.692422 8.500000 1 0 6 7
0 1 10.556338 10.830382 10.500000 1 1 6 7
1 0 11.481083 11.348219 8.500000 10 4 28 15
1 1 9.651711 10.170420 10.500000 2 2 28 15
2 0 11.499442 11.481358 11.500000 10 4 34 22
2 1 9.521022 9.694231 9.500000 66 100 34 22

_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Thu Jan  6 22:54:54 2005
Date: Thu, 06 Jan 2005 14:54:42 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: Karim Filali: [Gmtk-users] "New" GMTK observation file options summary

Just in case this isn't clear, Karim has done a bunch of work updating
the observation file handling in GMTK as well as a bunch of new programs
(the obs-* programs for handling observation files, which can be any
combination of pfiles, htk, ascii, or binary files).

GMTK now supports the combining of an arbitrary (rather than 3) number
of files online. This is the capital "X" in the argument names. Also, it
can combine observations of different lengths and duplicate observations
when segments in the files have different lengths, so that you don't
have to. Therefore, "X" is a number, so "-iX" really means
"-i1", "-i2", etc.

Lastly, when you don't need it, many options are no longer required
(e.g., the number of floats being specified in a pfile).

More examples will be forthcoming.

[ Part 2, "forwarded message"  Message/RFC822  10KB. ]

[ Unable to print this part. ]

Summary of the "new" observation options in GMTK:
------------------------------------------------

--------------------
a) [-fdiffactX <str>] Automatically adjust segment lengths:

The following options are supported to adjust the length of
segments so that they match across all supplied observation files:

<str> =
er : report an error (the default)
rl : repeat the last frame of shorter segments to match the
length of the longest corresponding segment across all
observation input files
rf : repeat the first frame
se : expand segmentally i.e. repeat each frame of the shorter
segments uniformely to match the length of the longest one
ts : truncate longer segments from the beginning so that their
length matches the length of the shortest corresponding segment
across all observation input files
te : truncate from the end

b) [-sdiffactX <str>] Automatically adjust file lengths:

The following options are supported to adjust the length of
input files so that they are the same:

<str> =
er : report an error
te : truncate longer files from the end (the default)
rl : repeat the last segment of shorter files
wa : wrap around i.e. shorter files will cycle back to the
beginning when a segment number larger than the size of the file
is requested.

These length adjustments take effect after all per-file transformations
are applied and before any global transformation or feature
combination is applied.

II) Transformations:
------------------

The following transformations are supported.  They are listed in the
order in which they take effect:

a) [-preprX str] Per-stream ("stream" and "file" are used
interchangeably.) frame range: select a range of frames in each
segment of observation file number X (to achieve downsampling for
example or remove parts of the data)

b) [-transX str] Per-stream transformations string.  They are applied
to float features only.

A transformation string has the following format

TRANS1[<float>|<int>|<@filename>][_TRANS2[<float>|<int>|<@filename>]][...]

where TRANSN can take the following values

X :        empty transformation.  Don't do anything.
O<float>:  add a constant offset to all features.  Example: O3.4
M<float>:  multiply by a constant.  Example: M12.5
N:         perform mean and variance normalization at the segment level
E:         perform mean substraction only
F<@filename>: Apply the filter in filename.  The filter is a vector
of floats.  Example F@"filter_file"
UH<int>:   upsample-hold by <int>.  Example: UH2 repeats each frame twice
US<int>:   upsample-smooth by <int>.  Example: US1 repeats each frame once
and performs smoothing
R<int>:    applies an ARMA filter of order <int>.  Example: R3

c) [-postprX str] Per-stream frame range after the -transX
transformations are applied

--->  Length adjustments go here  <---

--->  Feature combinations (see below) go here <---

d) [-posttrans str] Final transformation string.  Same format as the
per-stream transformation except it is applied over the global
observation matrix.

e) [-gpr str]  Global final frame range

III) Feature combinations
--------------------------
TWO streams can be combined feature-wise using the option
[-comb str].  Only float features are affected.
(sub), multiplication (mul), and division (div).

WARNING: The behavior of this feature when more than two streams are
combined is not well defined.  Same thing for division by zero.

The default is to allow the combination of two streams that have a
different number of float features.  The shorter stream is padded with
zeros.  This can be changed by defining
ALLOW_VARIABLE_DIM_COMBINED_STREAMS to be 0 in
GMTK_ObservationMatrix.h

--------------------------------------------------------------------

Examples:
---------

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2

0 0 0.000000 1.100000 2.200000 1 0
0 1 1.000000 10.100000 23.200001 1 1
0 2 2.000000 100.099998 19.200001 6 7
1 0 33.000000 1.100000 -7.000000 10 4
1 1 -1.000000 -2.100000 23.200001 2 2
2 0 400.000000 71.099998 98.000000 10 4
2 1 -1.000000 0.100000 -56.200001 66 100
2 2 0.040000 0.031000 19.200001 0 9

> obs-print -i1 file2 -ifmt1 ascii -nf1 2 -ni1 2

0 0 555.000000 5763.986816 6 7
0 1 -1.000000 0.100000 334 87
0 2 -0.003400 17.000000 99 56
0 3 -986.500000 88.000000 23 8
1 0 6.000000 100.900002 22 8
1 1 0.900000 0.100000 8 7
1 2 10.900000 9.000000 1 8

*** Unequal segment lengths ***

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2
ascii -nf2 2 -ni2 2

ERROR: The number of frames for segment 0 is not the same for
observation files 'file1' and 'file2' (3 vs. 4).  Use the -fdiff
option.

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2
ascii -nf2 2 -ni2 2 -fdiff1 rl

0 0 0.000000 1.100000 2.200000 555.000000 5763.986816 1 0 6 7
0 1 1.000000 10.100000 23.200001 -1.000000 0.100000 1 1 334 87
0 2 2.000000 100.099998 19.200001 -0.003400 17.000000 6 7 99 56
0 3 2.000000 100.099998 19.200001 -986.500000 88.000000 6 7 23 8
1 0 33.000000 1.100000 -7.000000 6.000000 100.900002 10 4 22 8
1 1 -1.000000 -2.100000 23.200001 0.900000 0.100000 2 2 8 7
1 2 -1.000000 -2.100000 23.200001 10.900000 9.000000 2 2 1 8

*** Unequal file lengths  ***

By default the longer file is truncated (see above:  result has 2
segments but file1 has 3)

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2
ascii -nf2 2 -ni2 2 -fdiff1 rl -sdiff1 wa -sdiff2 wa

0 0 0.000000 1.100000 2.200000 555.000000 5763.986816 1 0 6 7
0 1 1.000000 10.100000 23.200001 -1.000000 0.100000 1 1 334 87
0 2 2.000000 100.099998 19.200001 -0.003400 17.000000 6 7 99 56
0 3 2.000000 100.099998 19.200001 -986.500000 88.000000 6 7 23 8
1 0 33.000000 1.100000 -7.000000 6.000000 100.900002 10 4 22 8
1 1 -1.000000 -2.100000 23.200001 0.900000 0.100000 2 2 8 7
1 2 -1.000000 -2.100000 23.200001 10.900000 9.000000 2 2 1 8
2 0 400.000000 71.099998 98.000000 555.000000 5763.986816 10 4 6 7
2 1 -1.000000 0.100000 -56.200001 -1.000000 0.100000 66 100 334 87
2 2 0.040000 0.031000 19.200001 -0.003400 17.000000 0 9 99 56
2 3 0.040000 0.031000 19.200001 -986.500000 88.000000 0 9 23 8

*** Transformations ***

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2 ascii
-nf2 2 -ni2 2 -fdiff1 rl -sdiff1 wa -sdiff2 wa -pre1 0:1 -trans2
"N_M2.7_O5_F@filter_US1" -postpr2 0:3 -posttrans N_O10 -gpr 0:1

0 0 8.500000 8.500000 8.500000 10.603625 11.139734 1 0 6 7
0 1 10.500000 10.500000 10.500000 10.498142 10.397820 1 1 6 7
1 0 11.500000 11.500000 8.500000 10.845489 11.307186 10 4 22 8
1 1 9.500000 9.500000 10.500000 10.476808 10.257824 2 2 22 8
2 0 11.500000 11.500000 11.500000 10.603625 11.139734 10 4 6 7
2 1 9.500000 9.500000 9.500000 10.498142 10.397820 66 100 6 7

where the file file "filter" contains the line
1.2 2.4 1.2

*** Combinations  ***

> obs-print -i1 file1 -ifmt1 ascii -nf1 3 -ni1 2 -i2 file2 -ifmt2
ascii -nf2 2 -ni2 2 -fdiff1 rl -sdiff1 wa -sdiff2 wa -pre1 0:1
-trans2 "N_M2.7_O5_F@filter_US1" -postpr2 0:3 -posttrans N_O10 -gpr

0 0 10.494615 10.692422 8.500000 1 0 6 7
0 1 10.556338 10.830382 10.500000 1 1 6 7
1 0 11.481083 11.348219 8.500000 10 4 28 15
1 1 9.651711 10.170420 10.500000 2 2 28 15
2 0 11.499442 11.481358 11.500000 10 4 34 22
2 1 9.521022 9.694231 9.500000 66 100 34 22

_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

[ Part 3: "Attached Text" ]

_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users
From klivescu@csail.mit.edu Tue Jan 11 00:06:57 2005
Date: Mon, 10 Jan 2005 19:06:54 -0500
From: klivescu@csail.mit.edu
To: Joe Frankel <joe@cstr.ed.ac.uk>
Subject: Re: virtual evidence in gmtk

Hi Joe,

No problem.  By "code fragment" I assume you mean the relevant parts of
structure and parameter files, right?  Here is an example from one of my
experiments.  To give some context, here I have a variable for the underlying
(target) degree of lip opening, LIP-OPEN, a variable for the surface degree of
opening, actualLIP-OPEN, and a classifier for lip opening whose output has been
converted to a "likelihood" P(observations|actualLIP-OPEN) which serves as the
virtual evidence.

Here's the structure file snippet:

-------------

variable : LIP-OPEN {
type: discrete hidden cardinality LIP_OPEN_CARD;
switchingparents: LIP-OPENSpecified(0) using
mapping("directMappingWithOneParent");
conditionalparents: LIP-OPENPhone(0) using DenseCPT("LIP-OPENFrame0CPT")
|
LIP-OPENPhone(0) using DeterministicCPT("phoneme2LIP-OPENDetCPT");
}

variable : actualLIP-OPEN {
type: discrete hidden cardinality LIP_OPEN_CARD;
switchingparents: nil;
conditionalparents: LIP-OPEN(0) using DenseCPT("LIP-OPENDenseCPT");
}

variable : VE_LIP-OPEN {
type: discrete observed value 1 cardinality 2;
weight: scale WGT;
switchingparents: nil;
conditionalparents: actualLIP-OPEN(0) using
VirtualEvidenceCPT("LIP_OPEN_VECPT");
}

-------------

Here is the master file snippet:

-------------

VE_CPT_IN_FILE inline

1

0
LIP_OPEN_VECPT
1 % num par
LIP_OPEN_CARD % par card
2 % self card
VE_FILE_LIP_OPEN
LIP_OPEN_CARD % nfloats
0 % nints
all % float range
all % int range
all % must be all
ascii
F % endian swapping condition

-------------

And here are the first few frames of a virtual evidence data file:

-------------

-1.393021 -2.836323 -0.384027 -4.429328
-1.527227 -3.297594 -0.308737 -4.463940
-1.527227 -3.297594 -0.308737 -4.463940
-1.527227 -3.297594 -0.308737 -4.463940
-1.498988 -3.416047 -0.308566 -4.677438
-1.498988 -3.416047 -0.308566 -4.677438
-1.498988 -3.416047 -0.308566 -4.677438
-1.498988 -3.416047 -0.308566 -4.677438
-1.363664 -3.801153 -0.338016 -4.738805
-1.363664 -3.801153 -0.338016 -4.738805
-1.363664 -3.801153 -0.338016 -4.738805

-------------

Where the line for frame n contains

logP(obs_n|actualLIP-OPEN_n=0) logP(obs_n|actualLIP-OPEN_n=1)
logP(obs_n|actualLIP-OPEN_n=2) logP(obs_n|actualLIP-OPEN_n=3)

Does that help?

Karen

From bilmes@ssli-mail.ee.washington.edu Tue Mar  1 10:25:14 2005
Date: Tue, 01 Mar 2005 02:25:04 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] dev tag updated

Folks,

I've updated the gmtk dev tag.

Changes:
- new option: -cpbeam, which does partial clique beam pruning. This can
be used in combination with the other beam options (namely, -cbeam,
-crbeam, -ckbeam, and -sbeam).  It in particular can be used to prune
states away when Gaussians are ordered using -vcap so that they are
early. In other words, -cpbeam will prune away a clique as it is
being created based on an estimate of what in the future will be the
maximum clique value. -cpbeam can also reduce memory. The argument to
-cpbeam is of the same form as -cbeam. Lastly, -cpbeam only applies
to the C' partition (not P' or E').
- various minor speedups regarding clique packing,
- various bug fixes.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Sun Mar  6 03:21:09 2005
Date: Sat, 05 Mar 2005 19:20:59 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] dev tag updated (please get)

Yet another dev tag update (sorry).

Bug fixex:
- the previous dev tag updated had a bug which might cause some graphs run slower.
Update this one to fix that bug and also get ...

Enhancements/Speedups:
- new clique sorting options ('-vcap' M, T, '+', and '.'). The new default -vcap seems
to speed up inference by 10-20% on most graphs.
The other -vcap options '+' and '.' are by arbitrary and file position order (useful for debugging your models).

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Thu Mar  3 08:08:47 2005
Date: Thu, 03 Mar 2005 00:08:38 -0800
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] on most recent dev tag

Folks,

A few more details on yesterdays (short) dev tag update message:

- The new "-cpbeam" option is really different than all previous beam
options. You might ask, why is this beam pruning option different from
all other beam pruning options?  Well, basically, all previous beam
options construct the entire clique and only then will they start
pruning it down, so it means you need to have enough memory to hold at
least one clique. While the clique might be "pruned" via previous
pruning (via the separators that have been pruned with either -sbeam
or indirectly via previous clique -cbeams), there were still cases
where the clique construction itself gets large.

-cpbeam fixes that. Basically, while constructing a clique, if partial entries
in the clique fall below the currently estimated max clique value, then the
rest of the clique entry is not computed. Since cliques are essentially formed
like a depth-first tree traversal, early pruning with -cpbeam can have a large
effect in trimming off large portions of the tree (so they are never constructed).
This is why this option can help to decrease memory requirements as well.

There are two potential caveats, however, that I might as well mention
now. First, since we're beam pruning a clique before we know the
clique's max value, we have to estimate the clique's max value. This
is done by the two preceding instances of the clique to the left. This
can be done for C' but since there is only ever one P' or E' per
utterance, P' and E' don't currently benefit from -cpbeam (although it
would be possible to estimate from utterance to utterance which I'll
give a try at some point). So far, the simple estimation algorithm,
however, seem to be amazingly good (better than I expected) so this
probably won't be a problem.

Secondly, -cpbeam works by assuming that later entries in the clique
will always *reduce* the current probability.  For example, suppose
the clique entries consists of multiplying 5 probabilities:

p1*p2*p3*p4*p5

If -cpbeam finds that p1*p2*p3 falls below a threshold, it need not bother with p4 or p5,
but this really is only true if p4 <=1 and p5 <= 1. While this is true for probabilities,
if p4 or p5 are Gaussian scores, we might have p4 or p5 > 1. This can happen if you have
Gaussians with very small variances but you have observations that are close to the mean.
Therefore it is important to ensure that you're data is scaled correctly. There are several ways
to do this:

1) always globally variance normalize your acoustic feature vectors (unity variance should
be sufficient, but to be safer, you might even want to normalize to say variance 10 or 100,
it really depends on the data).
2) using a higher variance floor parameter
3) scaling the data on training/testing.

Note that option 1 and 3 is easy now that we have Karim's front end in place, which can
do all kinds of feature transformations online (see message from a few weeks ago on command
line syntax).

So to summarize, I do hope that -cpbeam makes everyones life much easier :-)

Other changes in yesterdays dev tag:

- trace messages for reading in files now turned on with -verb 59 rather than verb 90
(-verb 60 is the level at which you start getting a ton of messages)

- -vcap now supports 'S' (which sorts if a variable is switching or not)

Optimization options (skip if you don't want details):

- some triangulations involved variables with the "disposition 4" (meaning a
clique continuation). It turns out that this variable disposition was
unnecessary, and by removing them have achieved a speedup on those graphs.
I've seen speedups of 10-15% but it could be much more, depending on the
triangulation. In general, the triangulations with more disp4's will be speedup more.

- I secretly updated the dev tag earlier in Feb so that clique
packing is much faster (it turns out that you can organize the
packing to almost optimally not span word boundaries, which is what
is done now). In the best of cases, this can give another 20-40%
speedup. If you haven't updated lately, you'll probably notice this
change (except when cliques are small).

more speedups on the way ...

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Wed Jun  1 09:31:11 2005
Date: Wed, 01 Jun 2005 01:31:01 -0700
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] dev tag update

Folks,

Another update has occurred. In this latest one, the new hash tables are
in (at last). You'll see a small but consistent speedup over the old
hash tables.

Moreover, you now have control over the hash table load factor so you
lf" argument, where 0.1 <= lf <= 0.99 is the range of reasonable
values. Increasing lf (e.g., lf=0.90) means that the hash table will run
slower but use up less memory, and decreasing lf means things will run
faster but will use up more memory. Note that if lf gets too small,
things will start running *much* slower since you'll be thrashing.

Also beware that hings can run bit unexpectedly. If you are working with
inherently small state spaces, and you increase lf, things might even
run faster since everything might start fitting in cache.

The speed doesn't appear to get that much slower when increase lf, but
in some cases the memory can be reduced quite a bit (for big state space
usage when memory is a problem, you might try something like
"-hashLoadFactor 0.98" which at best might give you a factor 2 reduction
in memory without too much of a speed hit.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Fri May 27 10:33:09 2005
Date: Fri, 27 May 2005 02:32:59 -0700
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] gmtk dev tag updated

Hi Everyone,

In case you haven't updated in a while, a number of dev tag updates have
occurred recently. Most of them have been bug fixes, but a few new features
have been added that might as well be mentioned at this point:

1) GMTK now supports HTK lattices (thanks to Gang)
2) A new all_unequal() decision tree function is available
(i.e., all_unequal(p1,p2,p3,p4) returns one only when none of the parents are equal to each other).

In case anyone is wondering, the next set of features to be added will
be fast N-best lists, sampling, and then a re-done inference inner loop
that should give us some additional significant speedups.

Best,

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

From bilmes@ssli-mail.ee.washington.edu Fri May 27 10:33:55 2005
Date: Fri, 27 May 2005 02:33:50 -0700
From: Jeff Bilmes <bilmes@ssli-mail.ee.washington.edu>
To: gmtk-users@seagull.ee.washington.edu
Subject: [Gmtk-users] Sorry, I forgot...

forgot to mention that the all_unequal() function is thanks to Chris!

:-)

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes@ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+
_______________________________________________
Gmtk-users mailing list
Gmtk-users@ssli.ee.washington.edu
https://ssli.ee.washington.edu/mailman/listinfo/gmtk-users

Jeff Bilmes bilmes at ssli-mail.ee.washington.edu
Wed Jun 1 02:31:01 PDT 2005

* Next message: [Gmtk-users] question about Gaussians
* Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Folks,

Another update has occurred. In this latest one, the new hash tables are
in (at last). You'll see a small but consistent speedup over the old
hash tables.

Moreover, you now have control over the hash table load factor so you
lf" argument, where 0.1 <= lf <= 0.99 is the range of reasonable
values. Increasing lf (e.g., lf=0.90) means that the hash table will run
slower but use up less memory, and decreasing lf means things will run
faster but will use up more memory. Note that if lf gets too small,
things will start running *much* slower since you'll be thrashing.

Also beware that hings can run bit unexpectedly. If you are working with
inherently small state spaces, and you increase lf, things might even
run faster since everything might start fitting in cache.

The speed doesn't appear to get that much slower when increase lf, but
in some cases the memory can be reduced quite a bit (for big state space
usage when memory is a problem, you might try something like
"-hashLoadFactor 0.98" which at best might give you a factor 2 reduction
in memory without too much of a speed hit.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Assistant Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes at ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+

Jeff Bilmes bilmes at ssli-mail.ee.washington.edu
Sat Jul 23 01:19:09 PDT 2005

* Previous message: [Gmtk-users] using gmtkNGramIndex
* Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Two things of note in the GMTK world, namely Bayesian Dirichlet priors
and graph visualization.

=================================================================-

First, for those wishing just to get the latest static binaries, see:

http://ssli.ee.washington.edu/~bilmes/gmtk/linux/FriJul22_2005/

=================================================================-

GMTK now supports Bayesian Dirichlet priors inside of EM training for
multinomial distributions.  Specifically, Dirichlet priors can be used
for DenseCPTs, DPMFs, SparseCPTs (indirectly via DPMFs), and mixture
responsibilities (again via DPMFs).

Dense CPTs:

There are two ways to use Dirichlet priors.

Case A: One way is to have a *constant* count value added to all CPT
accumulators. This means that during learning, the expected counts for
the CPT entries will have an added constant value of 'alpha':

P(C=j|parents) = (1/Z) ( E[ counts of j | parents ] + alpha )

where alpha is the constant Dirichlet hyperparameter (count), and where
Z is a normalization constant. Without Dirichlet smoothing, it is always
the case that alpha == 0.

Case B: Another way is to have a table of count values for the CPT, as
in:

P(C=j|parents) = (1/Z) ( E[ counts of j | parents ] + alpha(j,parents) )

so the Dirichlet counts consist of a multi-dimensional table that is the
same size and dimensionality as the CPT. The change is entirely
backwardly compatible with existing DenseCPTs and DPMFs, so if you don't
want to use them, no changes are needed.

Here is the syntax for the above two cases:

Case A:

Here is a dense CPT with a constant Dirichlet count of 5:

---------------------------------------------
1 % cpt number 1
state1_with_state_pars % name
1 % num parents
3 3 % cards
DirichletConst 5
0.3333 0.3333 0.3333
0.3333 0.3333 0.3333
0.3333 0.3333 0.3333
---------------------------------------------

In other words, you can optionally include a 'DirichletConst v' before
the Dense CPT probabilities, where v is the shared Dirichlet
hyperparameter for all values of the child RV and for all parent
values. The value 'v' is a floating point value (so fractional counts
allowed), must be >= 0, and is a real count value (i.e., it is not in
log form).

Case B:

Here, you associate a Dirichlet Table with a CPT, as follows:

-----------------------
1 % cpt number 1
state1_with_state_pars % name
1 % num parents
3 3 % cards
DirichletTable 3x3dirichlet_tab
0.3333 0.3333 0.3333
0.3333 0.3333 0.3333
0.3333 0.3333 0.3333
-----------------------

where '3x3dirichlet_tab' is the name of a new GMTK object, of type
Dirichlet Table (these will be described below).

---------------------------
DPMFs/SparseCPTs:

Recall, that DPMFs are used for both (Gaussian) mixture
responsibilities, and also for the non-zero values of the rows of Sparse
CPTs. For using Dirichlet Priors with mixture responsibilities or
SparseCPTs, the Dirichlet prior are associated with the underlying DPMF.

Also recall that when DPMFs are used for responsibilities, they can
change length (i.e., when a mixture split or vanish occurs). Therefore,
if a split/vanish occurs, and it is currently using a DirichletTable, if
the length no longer matches, then the DirichletTable is no longer used
(i.e., GMTK will turn off smoothing). With high enough verbosity
(currently 50), you'll get a warning message when this occurs.

The syntax for DPMFs is very similar to DenseCPTs, specifically, we can
either have:

-----------------------
0 % pmf number 0
gmMixWeight0 2 % name, cardinality
DirichletConst 10
0.5 0.5
-----------------------

or

-----------------------
0 % pmf number 0
gmMixWeight0 2 % name, cardinality
DirichletTable table-name
0.5 0.5
-----------------------

---------------------------------------------

DirichletTables are a new GMTK object that store the Dirichlet prior
hyperparameters for each child RV value for all parent values. The
format is basically a N-Dimensional table. Here are two examples:

---------------------------------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Dirichlet Tables
DIRICHLET_TAB_IN_FILE inline
2

0 % dirichlet number 0
3dirichlet_tab % name
1 % table dimensionality
3 % dimensionalities
% meta counts
5 4 3

1 % dirichlet number 1
3x3dirichlet_tab % name
2 % table dimensionality
3 3 % dimensionalities
% meta counts
1e2 0   0
0   1e2 0
0   0  1e2
---------------------------------------------------------------------------------

The first table (named '3dirichlet_tab' defines a 1-dimensional length 3 table
with prior counts '5 4 3'. Such a table can be either used with a 3-element
DPMF, or a DenseCPT with no parents and child cardinality of 3.

The second table '3x3dirichlet_tab' defines a 2-dimensional 3x3
table. If this was a 3x3 Markov transition matrix, such priors would
encourage the states to do a self-transition (since the table is
diagonal).

When a Dirichlet Table is associate with a DenseCPT or DPMF, the
dimensionalities must match *exactly*! Also, the data order for each
dimension is exactly the same as for a DenseCPT. This means that if a
DenseCPT has d parents, the DirichletTable must have dimensionality
(d+1). All but the last dimension length of the DirichletTable must
match the cardinality of the corresponding parent of the DenseCPT, and
the last dimension length of the DirichletTable must match the
cardinality of the child used in the DenseCPT.

Note that, like 'DirichletConst', the table values are actual
real-valued counts, and that they must be >= 0.

Training: lastly, in order to get all this to work, you must use the
gmtkEMTrainNew program, and you must give the command line option
'-dirichletPriors T'. The default value for the command line parameter
is false.

The reason for this command line option is that if you are using GMTK in
parallel training mode with Dirichlet priors, you only want to turn on priors for
one process rather than all of them so that the prior counts don't get
counted as many times as you have divided the training into parallel
chunks. Thus, your parallel script would call something like:

gmtkEMTrainnew -dirichletPriors T  {parallel chunk 1 parameters}
gmtkEMTrainnew -dirichletPriors F  {parallel chunk 2 parameters}
gmtkEMTrainnew -dirichletPriors F  {parallel chunk 3 parameters}
...
gmtkEMTrainnew -dirichletPriors F  {parallel chunk N parameters}

It of course doesn't matter which chunk gets the '-dirichletPriors T' as
long as only one of them does. If you are not using Dirichlet priors in
any of your CPTs for DPFMs, then this command line option has no effect.

=================================================================-

There has been substantial progress on gmtkViz, the GMTK visualization
tool.  Alex Norman <alex at neisis.net> has been working on extending and
fixing bugs.

gmtkViz is not made by default since it depends on having the library
wxWidgets available. When you check out the latest version, you can try
a 'make gmtkViz' to make the gmtkViz program.

As I mentioned before, gmtkViz does not do graph layout for you, but
makes it really easy to do so, and you can export postscript to include

The latest changes have fixed many bugs (including printing) and added a
ton of new features. There is also an online quick-reference-quide when
<alex at neisis.net>, but only for the next few weeks since he'll be
leaving on Aug 19th. Please do try it out if you can though.

-- Jeff

+======================================================================+
|            Jeff A. Bilmes, Associate Professor                       |
| Dept. of EE              Voice: (206) 221-5236                       |
| University of Washington FAX: (206) 543-3842                         |
| Box 352500               Email: bilmes at ee.washington.edu             |
| Seattle, WA  98195-2500  http://www.ee.washington.edu/faculty/bilmes |
+======================================================================+



This topic: DistributedComputing > WebHome > Condor > UsingGMTKUnderCondor > DevTagUpdated
Topic revision: r1 - 11 May 2006 - 08:34:39 - SimonKing

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback