IDOL Speech Server provides five preconfigured speech-to-text tasks:
WavToText
, which performs speech-to-text on an audio fileStreamToText
, which performs speech-to-text on an audio streamStreamToTextMusicFilter
, which performs speech-to-text on an audio stream and categorizes the audio so that you can remove any sections categorized as music or noise from the resulting .CTM
file.TelWavToText
, which performs speech-to-text on audio files of telephone conversations. The task also detects and reports dial tones and DTMF dial tones (see DTMF Identification).To run speech-to-text on an audio file
Send an AddTask
action to IDOL Speech Server, and set the following parameters:
Type
|
The task name. Specify WavToText . |
File
|
The audio file to process. To restrict processing to a section of the audio file, set the start and end times in the |
Out
|
The file to write the transcription to. |
Lang
|
The language pack to use. |
For example:
http://localhost:13000/action=AddTask&Type=WavToText&File=C:/myData/Speech.wav&Out=SpeechTranscript.ctm&Lang=ENUS
This action uses port 13000
to instruct IDOL Speech Server, which is located on the local machine, to perform the WavToText
task on the Speech.wav
file and write the results to the SpeechTranscript.ctm
file. The Speech.wav
file contains U.S. English dialect speech.
If you are using a lattice file and want to reduce the lattice output size by including only one sample of each word in a specific window size, you can also set the LatWinSize
parameter. See Use a Lattice File
and the IDOL Speech Server Reference for more information.
This action returns a token. You can use the token to:
When you use IDOL Speech Server to process multiple data streams or files at the same time, the server might not have enough CPU or memory to process all of them at once. Speech-to-text operation is very CPU-intensive. To check whether a server has sufficient resources to run a WavToText
task, send a CheckResources
action. See Check Available Resources.
|