TWiki> CSTR Web>ParthaLal (01 Feb 2010, Main.s0565860)EditAttach
  • Italicised figures are on the GlobalPhone dev set WER
  • all other numbers are on the eval set WER unless otherwise stated.
  • (hmm, strange things going on in crosslingual tandem going from dev to eval...)

Language baseline MLP frame acc. monolingual monolingual AF xlingual xling w/MLLRMEAN Martin, 32G tri Schultz pdf Reichert et al.
Spanish 30.7 26.0 64.41 25.6 16.8     27.6(po) 18.4(po)   18.2(po) 26.6 20  
German 30.8 29.2 71.07 22.2 23.0     25.1(po) RUN ME! 25.4(po)   37.3 11.8  
Swedish 54.3 53.0 60.22 42.6 47.5     46.1(ge) 49.5(ge)          
Portuguese 27.1 25.0 57.81 23.8 22.6 25.2   26.4(sp) 24.7(sp)       19  
Russian 47.8 41.6 59.59 36.2 31.6     38.4(ge) 34.2(ge)          
Chinese 28.1 35.4                       ...

  • Chinese: Current result ignores tone and uses a dictionary limited to 65535 pronunciations by randomly excluding words not in the training set
  • German: crosslingual-adapted, hmm, not sure if I did the right transform here, it might be broken-CMLLR rather than working-MLLRMEAN as with Spanish...

Tuning hidden layer sizes:

parameters/data (%) 3-layer MLP (frame acc%) 5-layer bottleneck MLP (frame acc%)
  German Portuguese Spanish Swedish Russian Mandarin German
5 68.27           66.19
10 69.40           66.52
15 70.00           65.82
20 70.38           64.57
25     63.87        
30 70.88 57.63 64.00       -
35 71.07 57.66 64.05 55.06     -
40 70.91 56.95 64.28 55.10 59.48    
45 71.16 57.81 64.37 55.24 59.59    
50   57.81 64.41 55.22 59.55   -
55     64.37        
60     64.21        

To try to find differences between the Spanish and Portuguese data (so that we can normalize them out) I've computed the long-term average spectrum of a random 25 utterances from each corpus [plotted in attachment]. How can I actually say anything about them though? I could models the LTASs from each language with a Gaussian and compute KL-divergence. Or simply plot all LTASs instead of averages and just inspect them.

Another thing to look at is the MLP outputs in the crosslingual case. To try to work out if they're sensible I could look at the entropy of the output nodes at each frame [over the test data] and then look at the mean and variance in entropies coming out of the sp & po nets applied to sp & po data [four options there]. In progress.

  • Portuguese net outputs for utterance PO135_45, compared to labels:
    PO135_45-po_net_output.jpg

  • Spanish net outputs for utterance PO135_45, compared to labels:
    PO135_45-sp_net_output.jpg

  • Long-term average spectrum of spanish and portuguese eval sets, with errorbars showing one std. dev.:
    sp_po_ltas.jpg

Results for grapheme-based models (low priority):

Language WER(%) Tandem WER(%)
po    
sp 31.6  
sw 49.5 tying trees malformed

  • mean PLPs (+deltas,delta-deltas) with std-devs shown. Something weird with the Spanish ones...:
    adapting_PLPs.jpg
Topic attachments
I Attachment Action Size Date Who Comment
pdfpdf 10.1.1.9.8233.pdf manage 265.8 K 24 Nov 2009 - 10:57 Main.s0565860 Language Independent and Language Adaptive Acoustic Modeling for Speech Recognition - Schultz & Waibel
jpgjpg PO135_45-po_net_output.jpg manage 54.4 K 28 Sep 2009 - 09:55 Main.s0565860 Portuguese net outputs for utterance PO135_45, compared to labels
jpgjpg PO135_45-sp_net_output.jpg manage 55.1 K 28 Sep 2009 - 10:06 Main.s0565860 Spanish net outputs for utterance PO135_45, compared to labels
jpgjpg adapting_PLPs.jpg manage 34.8 K 18 Jan 2010 - 15:39 Main.s0565860 mean PLPs (+deltas,delta-deltas) with std-devs shown. Something weird with the Spanish ones...
txttxt optimal_s_p_sp_upmixed.txt manage 4.4 K 19 Sep 2007 - 08:41 Main.s0565860  
jpgjpg sp_po_ltas.jpg manage 70.1 K 29 Sep 2009 - 11:44 Main.s0565860 Long-term average spectrum of spanish and portuguese eval sets, with errorbars showing one std. dev.
pdfpdf sp_po_ltas.pdf manage 2.4 K 23 Sep 2009 - 15:33 Main.s0565860 Long-term Spectrum of 25 sp & po utts (sp in red)
txttxt the_mean_of_the_middle_state_of_M_o.txt manage 2.3 K 14 Sep 2007 - 09:38 Main.s0565860  
Topic revision: r115 - 01 Feb 2010 - 20:35:16 - Main.s0565860
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies