The ClusterSpeechToTextTel
task performs clustering of two speakers in a phone call, and uses the resulting speaker clusters to improve speech-to-text performance slightly by using speaker-sided acoustic normalization. As before, any telephony artifacts such as dial tones or DTMF tones are included, interspersed with the recognized words.
Parameter | Description | Required |
---|---|---|
Type | The task name. Set to ClusterSpeechToTextTel . |
Yes |
Conf | Whether to generate word confidence scores. | |
Diag | Whether to generate diagnostic information. | |
DiagFile | The file to write the diagnostic information to. | |
DnnScale | The DNN output acoustic score scaling factor. | |
File | The input audio file. | |
FixTime | A fixed size for speaker clusters. | |
Lang | The name of a language pack. | |
LatFile | The name of the lattice file that contains word hypotheses. | |
LatScale | The depth of the lattice. | |
LatWinSize | The size (in seconds) of the lattice output window. | |
LatWordFile | A list of words to find. | |
MaxNumSpeakers | The final maximum number of speakers to produce. | |
MergeThresh | The threshold below which to merge clusters. | |
MinNumSpeakers | The final minimum number of speakers to produce. | |
Mode | The algorithm mode for the speech-to-text process. | |
ModeValue | Sets the value of the parameter associated with the speech-to-text algorithm mode. | |
NormFile | The acoustic normalization file to use. | |
Out | The file that IDOL Speech Server writes task output to. | |
SampleFrequency | The sample frequency of the audio file to process. | |
SilThresh | The threshold between what the task identifies as silence and non-silence. | |
SpeechThresh | The threshold between speech and non-speech (music or noise). | |
SugdInputChannels | The channel layout of the input media file. | |
SugdInputFrequency | The sampling rate of the input media file. |
http://localhost:13000/action=AddTask&Type=ClusterSpeechToTextTel&File=C:/myData/Speech.wav&Out=SpeechTranscript.ctm&Lang=ENUS
This action uses port 13000
to instruct IDOL Speech Server, which is located on the local machine, to perform the ClusterSpeechToTextTel
task on the Speech.wav
file and write the results to the SpeechTranscript.ctm
file. The Speech.wav
file contains U.S. English dialect speech.
|