-- RavichanderVipperla - 10 May 2010


At this point, the architecture of the final system seems to be:

  1. EigenMike connects to EMIB.
  2. EMIB connects to the Macbook (running windows) through Firewire.
  3. The Podium Software on the Macbook does the beamforming.
  4. The beamformed ouput is sent to a remote machine over TCP/IP. A VST plugin (Author: Mike) that plugs into the Podium software will handle this network connection.
  5. An online ASR runs on a remote machine and decodes the beamformed sound streaming in through a socket.

To Do:

  1. Check with Simon if the Emime project meeting files on Macbook can be moved/deleted.
  2. Get the Macbook to stream one of the beamformed ouputs to its headphone jack.
  3. Decode with online ASR on the DICE machine with input from EigenMike.


Initial setup (eigenmike to dice machine thorugh a cable connection) as described below works end-to-end

  1. In Podium, set the Master output as (EigenMike-EMIB Headphone L+R). I presume this should only have an effect of displaying the EMIB-headphone outputs.
  2. Connect the headphone jack of EMIB to the mic input of DICE machine. (Turn the volume knob on EMIB to the highest value.. rightmost position).
  3. Check on Dice some sound recording software if the input is being captured.
  4. Run the recogniser.


  1. The sound level matching between EMIB and DICE machine is a bit tricky. Plugging the input into the microphone port maynot be a good idea. It is ok when the input level is low, but might damage the soundcard on the dice machine at higher input volumes.
  2. There is a background buzz in the input that seems to effect the recognition accuracies

To Do:

  1. Get the VST plugin for TCP/IP connection from Mike Lincoln and plug it into Podium.
  2. Try to increase the preamplifier gain for the EigenMike and see if ASR accuracies improve.

- tried it, but not much change.. currently set the preamplifier gain to 10dB.


  1. Got the TCP/IP VST plugin code from Mike
  2. Installed Microsoft Visual C++ Express 2010 on the Mac(Windows).
  3. Downloaded the VST SDK.
  4. Working directory: c:\Documents and Settings\em32\My Documents\Visual Studio 2010\Projects\vst_sdk2_3\vst_sdk2_3\vst_sdk2_3\SocketInterface2

Changes made in the code to compile

1) SocketInterface2Main .cpp:

Replaced: AEffect* main (audioMasterCallback audiomaster)

with: AEffect* VSTPluginMain (audioMasterCallback audiomaster)

2) SocketInterface2 .def:

Changed: Exports main

to: Exports main=VSTPluginMain

3) Properties (Alt+F7):

Changed the paths in : Custom Build Setup: General: Command Line: copy "..." "..."

To Do:

  1. Load the SocketInterface2 .dll plugin into podium and play with it.
  2. Get podium to stream the speech to dice machine.


  1. To test whether the audio is being streamed by the VST plugin, set up a client program to listen to the 4001 port on the localhost. (Good reference for Winsock programming: 1) http://msdn.microsoft.com/en-us/library/ms738545(v=VS.85).aspx 2) Beej's guide to Network Programming http://beej.us/guide/bgnet )
  2. Connection seems to be fine on between the client and the VST plugin (server) both running on the localhost but there is no audio streaming.
  3. The ASR system doesn't connect to the VST plugin. Need to figure this out.


  1. Mike has given the license key for Bidule (this is definitely much easier to use as compared to podium).
  2. The ASR system reads the port information from StreamSocketSource _Port and not SocketSource _Port. So added this information to the configuration.
  3. Streamed an audio file and got it decoded by the ASR. But there seems to be a problem with streaming the audio from EigenMike

To Do:

  1. Play around with Bidule and the EigenMike VST plugin. If these two work together well, this is preferred over Podium software.
  2. Need to figure out why the audio doesn't stream.

SocketInterface2 .dll is not the problem.


  1. Setup Bidule to get the input from microphone array and redirect it to both a socket stream as well as a file.
  2. Initial tests with ASR works fine, though the acoustic models dont work so well with my voice.
To Do:
  1. Need to do some recordings in the IMR with different configurations of beamforming.


  1. Make some recordings in the IMR, using omni and dipole patterns. But cant see a distinct drop in dB at 90 degrees using the dipole configuration
  2. Added a gain module to the beamformed output.
  3. Note :The preamplifier gain gets reset when the eigenMike is powered off. A gain of 30 dB seems to be good.
  4. Tried decoding some utterances standing at different distances from the mic. Not much insight as the errors are high with my accent even from close distance. Maybe I should play back some prerecorded audio.


  1. The directional pattern of the EigemMike is not clearly documented. So it is not clear which direction corresponds to azimuth-0, azimuth-180, elevation-0 or elevation 180
  2. To understand this, I recorded my voice on my phone counting from 1 to 60 (trying to maintain the same sound level). The number utterances are approximately spaced 1 second apart. This sound was played back from the phone from various directions in various experiments as explained below.
  3. Setting the directional pattern to 'First order cardioid', several recordings were made with varying values of elevation(0,90,180) and azimuth(0,90,180,270). These recordings are stored in 'C:\Documents and Settings\em32\Desktop\Recordings'. They are named using the convention 'numbers-card1_elev_x_azim_x'.
  4. For each recording, the source (playback of my utterances) was placed in a direction for a count of 4, and then moved to another direction for a count of 4 and so on. Assuming that the supporting stand of the EigenMike is farthest from me, the directions of the source placement are as follows: <1..4> front, <5..8>-Right, <9..12>-back, <13..16>-left, <17..20>-top, <21..24>-bottom.
  5. From the recording volumes, the azimuth and elevation directions are now clear. The details are written in the EigenMike page.


  1. Inspace recording (21/06 1-2 PM) - Robotics introduction to school kids. Captured the audio from the speakers (Dr. Jon Orblander and Dr. Sethu Vijaykumar). The microphone was setup near the west end of the speaker arena. Distance to the speaker from the microphone is about 5 meters. Captured using cardioid beamform pattern.
  2. Inspace recording (21/06 6-8 PM) - Encounters - Here comes the sun. Recorded the guest lectures by Prof.Russell Foster (University of Oxford), Prof. Kenneth Brophy (University of Glasgow, has scottish accent), and Bill McArthur (Astronaut, NASA, has US accent). Recording setup similar to that used in the afternoon robotics recoding.


  1. From the inSpace recordings, cut out 10 shorter utterances per speaker for 5 speakers.
  2. These files are stored on the eigenMike Mac laptop in 'Desktop\chunking'
  3. The wavefiles were downsampled to 16KHz (since the acoustic models are trained on 16KHz waveforms) 'Desktop\chunking\testSet'
  4. These test waveforms were transcribed (transcriptionTestSet). ( Note: The transcripts are not of the best quality. I didn't transcribe the short pauses, repetitions, Umms. Ohhs etc. And I couldn't make out what some words were despite having attended the lecture session and listening to the wavefile several times. God help ASR system smile )


  1. Setup a basic offline recognition system using the ICSINISTISL acoustic models adapted with 40 hours of AMI speech.
  2. The setup can be found on eddie cluster at /exports/informatics/inf_cstr_srenals_01/ravi/inspace/exp/baseline.
  3. The recognition accuracies are really low. Cant say the exact accuracies since the transcripts are not perfect, but very few words got correctly recognised. The problem is also compounded by the large out of vocabulary words and the differences in the domain of the test set and the training data.


It was decided that the AMI offlinesetup be used as a better baseline.


After a lot of hacks, the ami offline recogniser was made to work with the inspace testset.

The setup can be found on eddie at /exports/informatics/inf_cstr_srenals_01/ravi/offlineInspace .

The results are certainly better than the original baseline experiments, but the accuracies are still quite low. (due to OOV??)


  1. Meeting to setup a demo system interfacing dialogue system, speech and virtual characters. Attendees: Prof Steve Renals, Dr. Colin Matheson, Dr. Jochen Ehnes, and Ravi Vipperla
  2. We went to inspace to figure out an appropriate system to demostrate.
  3. It was decided that some objects would be placed in inspace, and users would interact with virtual characters using speech and the virtual characters would move close to the objects and give some description of the objects. It was not decided what the objects would be.
  4. Not clear at this point.. how the interfaces between various components would look like. I believe the dialogue system would be in prolog, ASR in c/c++ and not sure about the language used for vitual characters software.
  5. My task at this point is to develop a socket interface for ATK.


  1. Still struck with the socket interface. The architecture of ATK is not entirely clear and hence struggling to understand which parts of the code to change.
  2. Prof. Steve Young says that there is a VOIP version of ATK already developed for CLASSIC project. Xingkun Liu (now with Heriott Watt University, shares the source code of VOIP version of ATK. It is located at '/group/project/talk/camVoip18June2010'.
  3. But this version of ATK is based on SIP protocol. However, comparing the original code and this code.. it becomes much easier to figure out the functionality of each module.


Recorded the full day meeting in IMR with 8 beams seperated by 45 degrees from each other. The recordings are placed on eigenmike Mac Laptop at 'Desktop/Recordings. theya re named 'meeting_23072010_azimuth_<angle>_<recordingNumber>'

<angle> - 0 facing the screen/speaker and increasing in counter clockwise direction.

<recordingNumber> - 3-Morning session, 4-Afternoon session

The Bidule layout to record the 8 beams is available on the Mac laptop at 'C:\Program Files\Plogue\Bidule\layouts\EigenMike_multiDirectionRecording'


The socket version of ATK works. The details of the software and current issues are documented at AtkSocket

Topic revision: r9 - 05 Aug 2010 - 12:20:17 - RavichanderVipperla
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies