Speak! speech synthesis meeting.

The purpose of these regular informal meetings is to discuss and share progress relating to speech synthesis (audio and visual) research - within CSTR specifically as well as in the field generally. Talks are intended to be short and informal, with an emphasis on discussion, interaction and feedback. Relevant references should be sent round in advance to encourage everyone to contribute.

Everybody with an interest in speech synthesis (audio and visual) research is welcome.

  • Meetings will typically be held in the Instrumented Meeting Room ("IMR" - room 3.07), on Level 3 of the Informatics Forum building (though this may vary on odd occasions.)
  • At present, the standard time for these meetings is Thursdays at 2-3pm.

Speak! synthesis meetings schedule for 2013/2014

19.09.13 Schedule planning meeting (bring ideas + be ready to volunteer!)
26.09.13 Tuomo - IS2013 papers on glottal source (OQ estimation (Kane et al) + Excitation modeling ( Cabral))
03.10.13 Atef - Body language synthesis (reading + proposal) (Bozkurt et al. + Yang et al.)
10.10.13 Catherine - IS2013 paper on f0 alternatives (Slaney et al. 2013)
17.10.13 Rasmus - Realtime HMM-based synthesis / incremental TTS - recomended reading: Baumann et al
24.10.13 Rob - IS2013 prosody papers (Brognaux, Picart and Drugman 2013; Nagata, Mori and Nose 2013)
31.10.13 no meeting
07.11.13 Mirjam & Rasmus - Disfluency perception in speech synthesis
14.11.13 Cassia - MLSA filter issues
21.11.13 no meeting (talk by CSTR visitor Tatsuya Kawahara from U Kyoto due at this time)
28.11.13 no meeting (many absences anticipated)
05.12.13 no meeting
12.12.13 no meeting
19.12.13 no meeting
09.01.14 no meeting
23.01.14 Matthew Aylett - IDLAK
30.01.14 Tuomo Raitio - talk on "Deep neural networks for voice source modelling"
06.02.14 Korin - Alternatives to GMM acoustic modelling for synthesis/conversion (postponed to 20.03.2014)
13.02.14 David Braude - DDD presentation
20.02.14 Tom Merritt
27.02.14 Schedule planning meeting (bring ideas + be ready to volunteer!)
06.03.14 no meeting
13.03.14 Cassia - fast speech experiments
20.03.14 Korin - Alternatives to GMM acoustic modelling for speech synthesis - Chen et al - Interspeech 2013 (btw, many similarities to a paper we've looked at before, which you can review for background if desired - Ling et al (2013) - ICASSP)
27.03.14 no meeting -- SICSA Speech HCI workshop
03.04.14 Rasmus & Mirjam - Automatic filled pauses insertion
10.04.14 no meeting
17.04.14 no meeting - Easter
24.04.14 Qiong - Talk on "Sinusoidal model and its application for statistical speech synthesis" (recomended reading: (Cappe et al))
01.05.14 Rob - Structured Bayesian induction (for TTS?)
08.05.14 ?? no meeting - ICASSP
15.05.14 Planning meeting
22.05.14 no meeting - Speech prosody

Catherine - Some papers from speech prosody 2014 by Brognaux et al. on emphasis and situational differences

05.06.14 Korin & Qiong - Paper reading on speech synthesis based on neural network ICASSP (p3872-zen and p3857-qian)
12.06.14 Sam
19.06.14 Mirjam - non-native speech processing ICASSP

Gustav - Alternatives to GV

1) Variance scaling, Silén et al., Interspeech 2012 (Sound examples)

2) Modulation-spectrum based postfiltering, Takamichi et al., ICASSP 2014 (Sound examples)

03.07.14 Korin - Linear Dynamic Models Tsiaras et al. (2014 - ICASSP) and Quillen(2010 - ICASSP)
10.07.14 Benigno - NADE
17.07.14 Qiong - vocoding ICASSP (babacan14.pdf and bollepalli14.pdf)
24.07.14 Cenk Demiroglu (visitor)
31.07.14 Qiong - Second year review presentation
07.08.14 no meeting
14.08.14 Gustav & Peter - Voice conversion (Toda et al. (2007))

  • Bozkurt et al.: Multimodal Analysis of Speech Prosody and Upper Body Gestures using Hidden Semi-Markov Models, ICASSP 2013
  • Yang et al.: Toward Body language Generation in Dyadic Interaction Settings from Interlocutor Multimodal Cues, ICASSP 2013

Details of suggested papers to read:

Acoustic modelling etc:

  • Chunwijitra, Nose & Kobayashi. A speech parameter generation algorithm using local variance for HMM-based speech synthesis. In Proc. Interspeech, 2012


  • Eyben et al.: Unsupervised Clustering of Emotion and Voice Styles for Expressive TTS. In Proc. ICASSP, 2012

Scratchpad for other suggestions for meeting topics:

* Catherine - Talk on quotation work

* Statistical Text-to-Speech Synthesis with Improved Dynamics. Stas Tiomkin, David Malah; Technion IIT, Israel. Proc Interspeech 2008

* Tomoki Toda's work on voice conversion using less than one sentence of speech

* The Expression and Perception of Emotions: Comparing Assessments of Self versus Others Carlos Busso, Shrikanth S. Narayanan; University of Southern California, USA. In Proc. Interspeech 2008

* Scripted Dialogs versus Improvisation: Lessons Learned About Emotional Elicitation Techniques from the IEMOCAP Database Carlos Busso, Shrikanth S. Narayanan; University of Southern California, USA. In Proc. Interspeech 2008

* ZZT transform (Dutoit's student thesis) - IEEE journal paper (ACTION ON Matthew to find this paper)

Speak! meeting schedules

