TWiki> CSTR Web>MND (23 Apr 2010, Main.s0566164)EditAttach

-- Main.s0566164 - 23 Apr 2010

The following are items which we need to do (adapted from Junichi's email of 20/4/2010). Please edit the wiki to indicate when each task has been completed.

1. The original recordings:
We definitely need to have the original recordings of this movie.

The currently mp4 file is used. But we need to have very initial original
recording without any editing or signal processing such as
- format conversion
- sampling rate conversions
- noise suppression

2. Transcriptions

Please transcribe what he is saying in this movie very very precisely.
Note that it is essential to transcribe even filled pauses, disfluencies, mispronunciations etc.

3. Chop up waveforms

Please chop up the audio track sentence by sentence based on the transcriptions above and your eye.
Note that raw audio track must be used and silences in the beginning and end sentences
must be the same lengths

4. Out-of-vocubrary words and generation initial phoneme sequence.

Oliver will generate a pronunciation dictionary for out-of-vocubrary words included in the transcriptions
and will generate phoneme sequences for all the sentences.

5. Modify the phoneme sequence and annotation of scores for pronunciation accuracy

Please modify the phoneme sequences based on what his speech sounds.
This must be done using the Unisyn phoneset.

Please add 5-scale scores for the accuracy of the pronunciations for each phoneme and word.

6. Microphone characteristics and band-pass filtering

Since there are a lot of background noise in this movie, we need to suppress the noise somehow.

Please identify the microphone characteristics used for the recordings and remove noise
distributed in out of the frequency range of the microphone's capacity with bpf.

If the recoding is done using very high sampling rates (e.g. 96 kHz), we may apply
dither or delta-sigma modulation.

7. Noise reduction/suppression

Since there are a lot of noise, the band-pass filtering wouldn't be enough and thus
we would need to apply noise suppression techniques (e.g. SS).

This should be done sentence by sentence and its thresholds should be adjusted sentence by sentence.


8. Normal voice building procedures

After the above steps, I can start the normal voice building procedures
including label generation and speaker adaptation.

Topic revision: r1 - 23 Apr 2010 - 11:07:52 - Main.s0566164
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies