ClusterSpeechToTextTel

The ClusterSpeechToTextTel task performs clustering of two speakers in a phone call, and uses the resulting speaker clusters to improve speech-to-text performance slightly by using speaker-sided acoustic normalization. As before, any telephony artifacts such as dial tones or DTMF tones are included, interspersed with the recognized words.

Parameters

Parameter Description Required
Type The task name. Set to ClusterSpeechToTextTel. Yes
Conf Whether to generate word confidence scores.  
Diag Whether to generate diagnostic information.  
DiagFile The file to write the diagnostic information to.  
DnnScale The DNN output acoustic score scaling factor.  
File The input audio file.  
FixTime A fixed size for speaker clusters.  
Lang The name of a language pack.  
LatFile The name of the lattice file that contains word hypotheses.  
LatScale The depth of the lattice.  
LatWinSize The size (in seconds) of the lattice output window.  
LatWordFile A list of words to find.  
MaxNumSpeakers The final maximum number of speakers to produce.  
MergeThresh The threshold below which to merge clusters.  
MinNumSpeakers The final minimum number of speakers to produce.  
Mode The algorithm mode for the speech-to-text process.  
ModeValue Sets the value of the parameter associated with the speech-to-text algorithm mode.  
NormFile The acoustic normalization file to use.  
Out The file that IDOL Speech Server writes task output to.  
SampleFrequency The sample frequency of the audio file to process.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
SugdInputChannels The channel layout of the input media file.  
SugdInputFrequency The sampling rate of the input media file.  

Example

http://localhost:13000/action=AddTask&Type=ClusterSpeechToTextTel&File=C:/myData/Speech.wav&Out=SpeechTranscript.ctm&Lang=ENUS

This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to perform the ClusterSpeechToTextTel task on the Speech.wav file and write the results to the SpeechTranscript.ctm file. The Speech.wav file contains U.S. English dialect speech.


_HP_HTML5_bannerTitle.htm