TWiki> CSTR Web>BarbaraForbes (20 Mar 2007, Main.v1bforbe)EditAttach

-- Main.simonk - 15 Feb 2007

Barbara Forbes : articulatory feature recognition


Test a new feature representation by training ANNs to detect in speech, then compare to previous experiments using other feature systems


Add some references here



TIMIT is at /group/corpora/public/timit/original

Important note about TIMIT: do NOT use the sa1 and sa2 files (the 'shibboleth' utterances); only use the sx and si utterances (8 of these in total per speaker)

Project workspace is at /group/cstr/projects/dbns/v1bforbe

Preparing the data:

  • parameterise waveforms as PLPs, put in Quicknet format (a 'pfile')
    • Joe will do this
  • TIMIT labels -> Quicknet label files Here are master label files (mlfs) for both the original 61 phone set, and the standard reduced 39 phone set. These files are archives of all the labels.

    • Step 1: collapse phone labels down to feature labels (do this for each individual feature) - write a Python script to do this
    • Step 2: convert these collapsed labels files into Quicknet targets (Joe will do this)


Start with Quicknet. We might also try Nico if time permits.

Training the nets

  • Quicknet version
    • There will be one net per feature; it will have two outouts, one for "feature=1" and the other for "feature=0"
    • Softmax over these outputs

Computing accuracy

1) Framewise accuracy: just count all frames where all features were correct (also compute results per-feature)

2) Mapped-to-phones framewise accuracy: map each phone to the nearest (Euclidean distance) valid feature combination, then compute as above

3) Allowing for timing errors: allow a "collar" around phone boundaries when scoring (e.g. ignore those frames)



Experiments Completed:

* First pass training and testing of wave-mechanical features

* Confirmation (to within about 1%) on Quicknet system of previous results for multi-valued features and GP (King and Taylor, 2000)

Experiments to do before final assessment:

* Train/test on alternative velar parameterisation (Joe to generate pfiles from test_code3a.txt)

* Train/test on alternative parameterisation of silence/closure phase of oral stops (Joe to generate new files from an amended feature file I will send)

* As time allows, use Quicknet addon to train all features at once as well as individually. This will give a better measure of phoneme recognition ('all correct together') when the features are maximally independent. To be discussed with Joe.

Updated 20/03.2007

Topic attachments
I Attachment Action Size Date Who Comment
shsh manage 5.4 K 19 Feb 2007 - 10:19 Main.mwester  
elsemlf timit.mlf manage 3927.2 K 16 Feb 2007 - 10:31 Main.joe  
elsemlf timit39.mlf manage 2805.3 K 16 Feb 2007 - 10:31 Main.joe  
elseEXT train_test_ANNs manage 6.6 K 19 Feb 2007 - 10:14 Main.mwester  
Topic revision: r7 - 20 Mar 2007 - 11:18:46 - Main.v1bforbe
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies