TWiki> CSTR Web>PsychoModels (30 Apr 2012, Main.s0968719)EditAttach

Psychoacoustic Models

Lista workshop 2012: Using an intelligibility measure to create noise robust cepstral coefficients for HMM-based speech synthesis

Link for samples

Interspeech 2011 paper: Simple modifications to synthetic speech

Link for samples

LSP shift: natural and synthetic speech

Natural speech

speech \ SNR -10dB -5dB 0dB 5dB 10dB clean
original
modified

Synthetic speech

speech \ SNR -10dB -5dB 0dB 5dB 10dB clean
original
modified

Distances Measures for Speech

Brief descriptions of different distance measures for speech:

distanceMeasures.pdf

Experiments:

Evaluating the behavior of speech measures

MGE Training results

Experiments:

MGE results

Psychoacoustic Model 1

  • MPEG-1 layer 1 and 2 use Model 1
  • MPEG-1 layer 3 (MP3) uses Model 2 (has a different tone estimation)
  • Models the Masking Threshold in the frequency domain -- Frequency masking (simultaneous masking)

Experiments

Adding and removing noise according to masking threshold

Masking curves

Masking curve male speaker sampled at 44.1kHz:

jam_0010_28000.png

Male speaker (sampled 44.1kHz) samples:

  • Original speech signal:
  • Mask:
  • Mask + original speech signal (SNR=13dB) :
  • White noise+ original speech signal (SNR=13dB) :

jam_0022_T10.pngjam_0010_wn_4.95_13dB.png


Masking curve female speaker sampled at 44.1kHz:

meg_0003_68000.png

Female speaker (sampled 44.1kHz) samples:

  • Original speech signal:
  • Mask:
  • Mask + original speech signal (SNR=13dB) :
  • White noise+ original speech signal (SNR=13dB) :
meg_b0004_T10.pngmeg_b0004_wn_4.68_14dB.png
Topic attachments
I Attachment Action Size Date Who Comment
wavwav Luka_part_outInv_T0.wav manage 344.6 K 16 Feb 2010 - 18:05 Main.s0968719  
wavwav Luka_part_outInv_T10.wav manage 344.6 K 16 Feb 2010 - 18:06 Main.s0968719  
wavwav Luka_part_outInv_T12.wav manage 344.6 K 16 Feb 2010 - 18:07 Main.s0968719  
wavwav Luka_part_outInv_T14.wav manage 344.6 K 16 Feb 2010 - 18:07 Main.s0968719  
wavwav Luka_part_outInv_T16.wav manage 344.6 K 16 Feb 2010 - 18:08 Main.s0968719  
wavwav Luka_part_outInv_T18.wav manage 344.6 K 16 Feb 2010 - 18:08 Main.s0968719  
wavwav Luka_part_outInv_T2.wav manage 344.6 K 16 Feb 2010 - 18:05 Main.s0968719  
wavwav Luka_part_outInv_T20.wav manage 344.6 K 16 Feb 2010 - 18:08 Main.s0968719  
wavwav Luka_part_outInv_T4.wav manage 344.6 K 16 Feb 2010 - 18:06 Main.s0968719  
wavwav Luka_part_outInv_T6.wav manage 344.6 K 16 Feb 2010 - 18:06 Main.s0968719  
wavwav Luka_part_outInv_T8.wav manage 344.6 K 16 Feb 2010 - 18:03 Main.s0968719  
pdfpdf analysisModifications.pdf manage 206.3 K 24 Nov 2010 - 13:13 Main.s0968719  
wavwav arctic_a0442_1_s.wav manage 156.3 K 08 Jun 2010 - 09:54 Main.s0968719  
wavwav arctic_a0442_1_sn_-10dB.wav manage 156.3 K 07 Jun 2010 - 11:10 Main.s0968719  
wavwav arctic_a0442_1_sn_-5dB.wav manage 156.3 K 07 Jun 2010 - 11:09 Main.s0968719  
wavwav arctic_a0442_1_sn_0dB.wav manage 156.3 K 07 Jun 2010 - 11:07 Main.s0968719  
wavwav arctic_a0442_1_sn_10dB.wav manage 156.3 K 07 Jun 2010 - 11:09 Main.s0968719  
wavwav arctic_a0442_1_sn_5dB.wav manage 156.3 K 07 Jun 2010 - 11:08 Main.s0968719  
wavwav arctic_a0442_2_s.wav manage 156.3 K 08 Jun 2010 - 09:55 Main.s0968719  
wavwav arctic_a0442_2_sn_-10dB.wav manage 156.3 K 07 Jun 2010 - 11:12 Main.s0968719  
wavwav arctic_a0442_2_sn_-5dB.wav manage 156.3 K 07 Jun 2010 - 11:12 Main.s0968719  
wavwav arctic_a0442_2_sn_0dB.wav manage 156.3 K 07 Jun 2010 - 11:08 Main.s0968719  
wavwav arctic_a0442_2_sn_10dB.wav manage 156.3 K 07 Jun 2010 - 11:11 Main.s0968719  
wavwav arctic_a0442_2_sn_5dB.wav manage 156.3 K 07 Jun 2010 - 11:11 Main.s0968719  
pdfpdf distanceMeasures.pdf manage 62.9 K 25 Oct 2010 - 14:00 Main.s0968719 Brief descriptions of different distance measures for speech
wavwav in_jam_0010.wav manage 220.4 K 11 Nov 2009 - 11:22 Main.s0968719 Original speech signal -- male speaker sampled at 44.1kHz
wavwav in_meg_arctic_b0004.wav manage 316.0 K 11 Nov 2009 - 11:59 Main.s0968719 Original speech signal -- female
pngpng jam_0010_28000.png manage 14.3 K 08 Feb 2010 - 14:26 Main.s0968719 Masking curve male speaker sampled at 44.1kHz
pngpng jam_0010_T10.png manage 5.6 K 08 Feb 2010 - 15:50 Main.s0968719  
pngpng jam_0010_wn_4.95_13dB.png manage 4.7 K 08 Feb 2010 - 15:46 Main.s0968719  
wavwav jam_0014_1_s.wav manage 150.8 K 07 Jun 2010 - 11:30 Main.s0968719  
wavwav jam_0014_1_sn_-10dB.wav manage 150.8 K 07 Jun 2010 - 11:45 Main.s0968719  
wavwav jam_0014_1_sn_-5dB.wav manage 150.8 K 07 Jun 2010 - 11:46 Main.s0968719  
wavwav jam_0014_1_sn_0dB.wav manage 150.8 K 07 Jun 2010 - 11:43 Main.s0968719  
wavwav jam_0014_1_sn_10dB.wav manage 150.8 K 07 Jun 2010 - 11:44 Main.s0968719  
wavwav jam_0014_1_sn_5dB.wav manage 150.8 K 07 Jun 2010 - 11:44 Main.s0968719  
wavwav jam_0014_2_s.wav manage 150.8 K 07 Jun 2010 - 11:30 Main.s0968719  
wavwav jam_0014_2_sn_-10dB.wav manage 150.8 K 07 Jun 2010 - 11:49 Main.s0968719  
wavwav jam_0014_2_sn_-5dB.wav manage 150.8 K 07 Jun 2010 - 11:48 Main.s0968719  
wavwav jam_0014_2_sn_0dB.wav manage 150.8 K 07 Jun 2010 - 11:46 Main.s0968719  
wavwav jam_0014_2_sn_10dB.wav manage 150.8 K 07 Jun 2010 - 11:48 Main.s0968719  
wavwav jam_0014_2_sn_5dB.wav manage 150.8 K 07 Jun 2010 - 11:47 Main.s0968719  
wavwav jam_0022_outInv_T0.wav manage 235.1 K 16 Feb 2010 - 18:02 Main.s0968719  
wavwav jam_0022_outInv_T10.wav manage 235.1 K 16 Feb 2010 - 18:03 Main.s0968719  
wavwav jam_0022_outInv_T12.wav manage 235.1 K 16 Feb 2010 - 18:04 Main.s0968719  
wavwav jam_0022_outInv_T14.wav manage 235.1 K 16 Feb 2010 - 18:04 Main.s0968719  
wavwav jam_0022_outInv_T16.wav manage 235.1 K 16 Feb 2010 - 18:04 Main.s0968719  
wavwav jam_0022_outInv_T18.wav manage 235.1 K 16 Feb 2010 - 18:04 Main.s0968719  
wavwav jam_0022_outInv_T2.wav manage 235.1 K 16 Feb 2010 - 18:03 Main.s0968719  
wavwav jam_0022_outInv_T20.wav manage 235.1 K 16 Feb 2010 - 18:05 Main.s0968719  
wavwav jam_0022_outInv_T4.wav manage 235.1 K 16 Feb 2010 - 18:03 Main.s0968719  
wavwav jam_0022_outInv_T6.wav manage 235.1 K 16 Feb 2010 - 18:03 Main.s0968719  
wavwav jam_0022_outInv_T8.wav manage 235.1 K 16 Feb 2010 - 18:57 Main.s0968719  
wavwav mask_jam_0010.wav manage 220.4 K 08 Feb 2010 - 14:35 Main.s0968719 Mask created from speech signal -- male speaker sampled at 44.1kHz
wavwav mask_meg_arctic_b0004.wav manage 244.2 K 08 Feb 2010 - 14:36 Main.s0968719  
pngpng meg_0003_68000.png manage 15.3 K 08 Feb 2010 - 14:18 Main.s0968719 Masking curve female speaker sampled at 44.1kHz
wavwav meg_arctic_b0032_outInv_T0.wav manage 156.3 K 16 Feb 2010 - 18:09 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T10.wav manage 156.3 K 16 Feb 2010 - 18:11 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T12.wav manage 156.3 K 16 Feb 2010 - 18:11 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T14.wav manage 156.3 K 16 Feb 2010 - 18:12 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T16.wav manage 156.3 K 16 Feb 2010 - 18:12 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T18.wav manage 156.3 K 16 Feb 2010 - 18:12 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T2.wav manage 156.3 K 16 Feb 2010 - 18:09 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T20.wav manage 156.3 K 16 Feb 2010 - 18:12 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T4.wav manage 156.3 K 16 Feb 2010 - 18:10 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T6.wav manage 156.3 K 16 Feb 2010 - 18:10 Main.s0968719  
wavwav meg_arctic_b0032_outInv_T8.wav manage 156.3 K 16 Feb 2010 - 18:11 Main.s0968719  
pngpng meg_b0004_T10.png manage 5.7 K 08 Feb 2010 - 15:52 Main.s0968719  
pngpng meg_b0004_wn_4.68_14dB.png manage 5.1 K 08 Feb 2010 - 15:53 Main.s0968719  
wavwav out_jam_0010.wav manage 220.4 K 08 Feb 2010 - 14:35 Main.s0968719 Mask + original speech signal -- male speaker sampled at 44.1kHz
wavwav out_meg_arctic_b0004.wav manage 244.2 K 08 Feb 2010 - 14:35 Main.s0968719 Mask + original speech signal
wavwav peakCEP_0.000.wav manage 232.5 K 24 Nov 2010 - 13:03 Main.s0968719  
wavwav peakCEP_0.200.wav manage 96.9 K 24 Nov 2010 - 12:51 Main.s0968719  
wavwav peakCEP_0.500.wav manage 96.9 K 24 Nov 2010 - 12:51 Main.s0968719  
wavwav peakCEP_0.800.wav manage 96.9 K 24 Nov 2010 - 12:52 Main.s0968719  
wavwav peakLSP_0.600.wav manage 94.6 K 24 Nov 2010 - 12:50 Main.s0968719  
wavwav peakLSP_0.700.wav manage 94.6 K 24 Nov 2010 - 12:50 Main.s0968719  
wavwav peakLSP_0.800.wav manage 94.6 K 24 Nov 2010 - 12:51 Main.s0968719  
wavwav rate_1.000.wav manage 94.6 K 24 Nov 2010 - 12:46 Main.s0968719  
wavwav rate_1.400.wav manage 132.5 K 24 Nov 2010 - 12:48 Main.s0968719  
wavwav rate_2.000.wav manage 189.3 K 24 Nov 2010 - 12:47 Main.s0968719  
wavwav shiftF0_1.200.wav manage 94.6 K 24 Nov 2010 - 12:43 Main.s0968719  
wavwav shiftF0_1.300.wav manage 94.6 K 24 Nov 2010 - 12:43 Main.s0968719  
wavwav shiftF0_1.500.wav manage 94.6 K 24 Nov 2010 - 12:44 Main.s0968719  
wavwav shiftLSP_1.025.wav manage 94.6 K 24 Nov 2010 - 12:48 Main.s0968719  
wavwav shiftLSP_1.050.wav manage 94.6 K 24 Nov 2010 - 12:49 Main.s0968719  
wavwav shiftLSP_1.075.wav manage 94.6 K 24 Nov 2010 - 12:50 Main.s0968719  
wavwav wn_out_jam_0010.wav manage 220.4 K 08 Feb 2010 - 14:37 Main.s0968719 White noise + speech
wavwav wn_out_meg_arctic_b0004.wav manage 244.2 K 08 Feb 2010 - 14:36 Main.s0968719 White noise + speech
Topic revision: r29 - 30 Apr 2012 - 08:27:15 - Main.s0968719
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies