ClusterSpeechTel

The ClusterSpeechTel task clusters telephony speech into speaker segments. For example, if two speaker clusters are identified, the output labels are Cluster_0 and Cluster_1 respectively.

Parameters

Parameter Description Required
Type The task name. Set to ClusterSpeechTel. Yes
File The input audio file.  
FixTime A fixed size for speaker clusters.  
MaxNumSpeakers The final maximum number of speakers to produce.  
MergeThresh The threshold below which to merge clusters.  
MinNumSpeakers The final minimum number of speakers to produce.  
NormFile The acoustic normalization file to use.  
Out The file that IDOL Speech Server writes task output to.  
SampleFrequency The sample frequency of the audio file to process.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
SugdInputChannels The channel layout of the input media file.  
SugdInputFrequency The sampling rate of the input media file.  

Example

http://localhost:13000/action=AddTask&Type=ClusterSpeechTel&File=1h.wav&Lang=ENUK-tel&Out=outTel

This action uses port 15000 to instruct IDOL Speech Server, which is located on the local machine, to cluster the data in the 1h.wav telephony audio file into speaker segments, and to write the results to the outTel output file.


_HP_HTML5_bannerTitle.htm