-- Main.simonk - 15 Feb 2007
Barbara Forbes : articulatory feature recognition
Goals
Test a new feature representation by training ANNs to detect in speech, then compare to previous experiments using other feature systems
Background
Add some references here
Method
Data
TIMIT is at /group/corpora/public/timit/original
Important note about TIMIT: do NOT use the sa1 and sa2 files (the 'shibboleth' utterances); only use the sx and si utterances (8 of these in total per speaker)
Project workspace is at /group/cstr/projects/dbns/v1bforbe
Preparing the data:
- parameterise waveforms as PLPs, put in Quicknet format (a 'pfile')
- TIMIT labels -> Quicknet label files Here are master label files (mlfs) for both the original 61 phone set, and the standard reduced 39 phone set. These files are archives of all the labels.
-
- Step 1: collapse phone labels down to feature labels (do this for each individual feature) - write a Python script to do this
- Step 2: convert these collapsed labels files into Quicknet targets (Joe will do this)
Tools
Start with Quicknet. We might also try Nico if time permits.
Training the nets
- Quicknet version
- There will be one net per feature; it will have two outouts, one for "feature=1" and the other for "feature=0"
- Softmax over these outputs
Computing accuracy
1) Framewise accuracy: just count all frames where all features were correct (also compute results per-feature)
2) Mapped-to-phones framewise accuracy: map each phone to the nearest (Euclidean distance) valid feature combination, then compute as above
3) Allowing for timing errors: allow a "collar" around phone boundaries when scoring (e.g. ignore those frames)
Results
BarbaraForbesResults
Experiments Completed:
* First pass training and testing of wave-mechanical features
* Confirmation (to within about 1%) on Quicknet system of previous results for multi-valued features and GP (King and Taylor, 2000)
Experiments to do before final assessment:
* Train/test on alternative velar parameterisation (Joe to generate pfiles from test_code3a.txt)
* Train/test on alternative parameterisation of silence/closure phase of oral stops (Joe to generate new files from an amended feature file I will send)
* As time allows, use Quicknet addon to train all features at once as well as individually. This will give a better measure of phoneme recognition ('all correct together') when the features are maximally independent. To be discussed with Joe.
Updated 20/03.2007