Speak! speech synthesis meeting.

The purpose of these regular informal meetings is to discuss and share progress relating to speech synthesis (audio and visual) research - within CSTR specifically as well as in the field generally. Talks are intended to be short and informal, with an emphasis on discussion, interaction and feedback. Relevant references should be sent round in advance to encourage everyone to contribute.

Everybody with an interest in speech synthesis (audio and visual) research is welcome.

  • Meetings will typically be held in the Instrumented Meeting Room ("IMR" - room 3.07), on Level 3 of the Informatics Forum building (though this may vary on odd occasions.)
  • At present, the standard time for these meetings is Thursdays at 2-3pm.

Speak! synthesis meetings schedule for 2014/2015

25.09.14 Schedule planning meeting (bring ideas + be ready to volunteer!)
02.10.14 Mirjam - Pronunciation variation for TTS ( Kolluru et al, Brognaux et al, Lecumberri et al)
09.10.14 Rob/Cassie - Talker variability (Bailly & Martin , Luan et al)
16.10.14 Gustav/Cassia - Postfilter (DNN postfilter , GV )
23.10.14 Zhizheng - Sequence-based training for DNN (LSTM TTS , Sequence error DNN for VC )
30.10.14 Evaluation for next Blizzard challenge - child evaluation Example audiobook data - evaluation guidelines ( document )
06.11.14 Simon - waveform synthesis IS140893.PDF (background: IS080193.PDF)
13.11.14 Shinnosuke Takamichi - Modulation spectrum-based approach to high-quality statistical parametric speech synthesis
20.11.14 Evaluation guidelines
27.11.14 no meeting (Christmas lunch)
04.12.14 Rosie - Spanish evaluation
11.12.14 Gustav - New loss functions and distributions for speech synthesis
18.12.14 no meeting
08.01.15 no meeting
15.01.15 Planning meeting
22.01.15 Ruben
29.01.15 no meeting - NST meeting
05.02.15 Felipe - vocoder journal paper (here)
12.02.15 Cassia - Restoring high frequency components from low-sampling-rate speech (paper here)
19.02.15 Sam - CWT Perceptual Experiments + MOS-MUSHRA Discussion: (notes here)
26.02.15 Mirjam - A trio of random interesting papers from SLT ( Lara Martin et al., Gina-Anne Levow et al., Verena Venek et al.)
05.03.15 Qiong- ICASSP: Vocaine vocoder paper here & Fusion vocoder; Simon - speech pre-enhancement (paper here , samples here)
12.03.15 Tom - Interspeech paper
19.03.15 Rob - Prosody discussion
26.03.15 Rasmus - presentation of work during Google
02.04.15 no meeting - Easter
09.04.15 Gustav - Quality prediction for TTS (1st priority: journal paper, 2nd priority: KLD Interspeech paper)
16.04.15 No meeting
23.04.15 Srikanth presentation
30.04.15 No meeting
07.05.15 Oliver - hybrid synthesis and speech enhancement
14.05.15 Cassia - Modelling the waveform using DNNs (paper here) + Planning - ICASSP papers
21.05.15 A Script for Machine Synthesis
28.05.15 No meeting (NST meeting)
04.06.15 Rob - Prosody papers Icassp
11.06.15 Korin
18.06.15 Simon - A Mouth Full Of Words: Visually Consistent Acoustic Redubbing (paper + demo) - just because it's amusing
25.06.15 No meeting
02.07.15 No meeting (UK speech)
09.07.15 Tom
16.07.15 Zhizheng - the effects of DNN in SPSS (paper)
23.07.15 no meeting
30.07.15 Gustav - Waveform-level probabilistic modelling (Achan et al.)
06.08.15 Summer school summary
13.08.15 no meeting
20.08.15 Qiong 3rd year review (10am)
27.08.15 no meeting
03.09.15 Interspeech practice talks/posters

  • Reccurent latent variable model for sequential data: paper

Details of suggested papers to read:

Acoustic modelling etc:

  • Chunwijitra, Nose & Kobayashi. A speech parameter generation algorithm using local variance for HMM-based speech synthesis. In Proc. Interspeech, 2012


  • Eyben et al.: Unsupervised Clustering of Emotion and Voice Styles for Expressive TTS. In Proc. ICASSP, 2012

Scratchpad for other suggestions for meeting topics:

* Catherine - Talk on quotation work

* Statistical Text-to-Speech Synthesis with Improved Dynamics. Stas Tiomkin, David Malah; Technion IIT, Israel. Proc Interspeech 2008

* Tomoki Toda's work on voice conversion using less than one sentence of speech

* The Expression and Perception of Emotions: Comparing Assessments of Self versus Others Carlos Busso, Shrikanth S. Narayanan; University of Southern California, USA. In Proc. Interspeech 2008

* Scripted Dialogs versus Improvisation: Lessons Learned About Emotional Elicitation Techniques from the IEMOCAP Database Carlos Busso, Shrikanth S. Narayanan; University of Southern California, USA. In Proc. Interspeech 2008

* ZZT transform (Dutoit's student thesis) - IEEE journal paper (ACTION ON Matthew to find this paper)

-- Main.korin - 12 Sep 2013

Speak! meeting schedules

