TWiki> CSTR Web>F0parametrisation (10 Mar 2010, Main.v1astan)EditAttach

F0 parametrisation using DCT coefficients

First results

02.03.2010

The experiment goes like this:

  1. Extract F0 using one of the methods used in HTS from the original wav files
  2. Interpolate F0 using the matlab code: writeInterpolatedF0.m
  3. Based on the interpolated F0 and the HTS labels, extract F0 curves at syllable level and also features related to syllable, word and phrase. For example, based on the Zen06 description, retain only: a1-c3, g1-j3 inclusive.
  4. We now have for all the syllables in the utterance the corespondent F0 curve and HTS like features
  5. From all the syllable level F0 curves, extract 5 DCT coefficients,using Matlab code: writeDCTCoeffs.m
  6. Concatenate the syllable level features and DCT coeffiecients to obtain an ARFF format fil. The header for the ARFF file is here
  7. Used WEKA's M5P regression trees to predict first 3 DCT coefficients: DCT1, DCT2, DCT3. For DCT4 and DCT5 use the mean value of the training data
  8. For reconstruction, for a certain utterance, obtain the syllable level features and predict the DCT coefficients. For the inverse DCT, use the original DCT1 coefficient, and predict the others.

For the training data, James database was used, there are aprox. 15000 syllables in the corpus.

jam_300 and jam_301 were left aside for testing purposes.

Estimated the DCT coefficients for the 2 utterances, using the original DCT1 coefficient and estimated DCT2, DCT3 and the mean value of the training data for DCT4 and DCT5.

The following 2 figures compare the F0 curves obtained through inverse DCT of the original and the estimated DCT coefficients:

jam_300 (61 syllables)

jam_301 (31 syllables)

Audio samples

- 2 utterances jam_0300 and jam_0301

UPDATE 10-03-2010: Added samples from phrase and syllable DCT predictions:

  • these files are synthesized from F0 contours obtained from both phrase level and syllable level DCT coefficients predictions. DCT1 is from phrase level predictions adn DCT2-5 are from syllable level prediction. The result is flatter than the original one:


dsa




-- Main.v1astan - 02 Mar 2010

Topic attachments
I Attachment Action Size Date Who Comment
jpgjpg James300.jpg manage 77.3 K 02 Mar 2010 - 16:41 Main.v1astan  
jpgjpg James301.jpg manage 67.1 K 02 Mar 2010 - 16:41 Main.v1astan  
elseEXT header manage 2.0 K 02 Mar 2010 - 16:56 Main.v1astan  
pdfpdf hts_lab_format.pdf manage 18.4 K 02 Mar 2010 - 16:52 Main.v1astan  
wavwav jam_actual_0300.wav manage 1184.8 K 05 Mar 2010 - 14:49 Main.v1astan  
wavwav jam_actual_0301.wav manage 683.5 K 05 Mar 2010 - 14:49 Main.v1astan  
wavwav jam_original_0300.wav manage 1184.8 K 03 Mar 2010 - 09:37 Main.v1astan  
wavwav jam_original_0301.wav manage 683.5 K 03 Mar 2010 - 09:37 Main.v1astan  
wavwav jam_phr_syll_0300.wav manage 1184.8 K 10 Mar 2010 - 10:42 Main.v1astan  
wavwav jam_phr_syll_0301.wav manage 683.5 K 10 Mar 2010 - 10:42 Main.v1astan  
wavwav jam_phrsyll_0300.wav manage 1184.8 K 10 Mar 2010 - 10:41 Main.v1astan  
wavwav jam_predicted_0300.wav manage 1184.8 K 05 Mar 2010 - 14:50 Main.v1astan  
wavwav jam_predicted_0301.wav manage 683.5 K 05 Mar 2010 - 14:50 Main.v1astan  
Topic revision: r4 - 10 Mar 2010 - 10:46:30 - Main.v1astan
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies