F0 parametrisation using DCT coefficients
First results
02.03.2010
The experiment goes like this:
- Extract F0 using one of the methods used in HTS from the original wav files
- Interpolate F0 using the matlab code: writeInterpolatedF0.m
- Based on the interpolated F0 and the HTS labels, extract F0 curves at syllable level and also features related to syllable, word and phrase. For example, based on the Zen06 description, retain only: a1-c3, g1-j3 inclusive.
- We now have for all the syllables in the utterance the corespondent F0 curve and HTS like features
- From all the syllable level F0 curves, extract 5 DCT coefficients,using Matlab code: writeDCTCoeffs.m
- Concatenate the syllable level features and DCT coeffiecients to obtain an ARFF format fil. The header for the ARFF file is here
- Used WEKA's M5P regression trees to predict first 3 DCT coefficients: DCT1, DCT2, DCT3. For DCT4 and DCT5 use the mean value of the training data
- For reconstruction, for a certain utterance, obtain the syllable level features and predict the DCT coefficients. For the inverse DCT, use the original DCT1 coefficient, and predict the others.
For the training data, James database was used, there are aprox. 15000 syllables in the corpus.
jam_300 and jam_301 were left aside for testing purposes.
Estimated the DCT coefficients for the 2 utterances, using the
original DCT1 coefficient and
estimated DCT2, DCT3 and the
mean value of the training data for DCT4 and DCT5.
The following 2 figures compare the F0 curves obtained through inverse DCT of the original and the estimated DCT coefficients:
jam_300 (61 syllables)
jam_301 (31 syllables)
Audio samples
- 2 utterances jam_0300 and jam_0301
-- Main.v1astan - 02 Mar 2010
Topic revision: r2 - 03 Mar 2010 - 09:38:15 - Main.v1astan