The following tasks are available out of the box.
Task | Description |
---|---|
AfpAddTrackStream | Adds a new audio track to an Audio Fingerprint database, receiving the audio data as a stream, and converting it into AFP features before indexing. |
AfpAddTrackWav | Adds a new audio track to an Audio Fingerprint database, reading the data from an audio file, and converting it into AFP features before indexing. |
AfpDatabaseInfo | Returns a list of all tracks that are currently stored within the specified Audio Fingerprint database. |
AfpDatabaseOptimize | Optimizes the internal indexing of the specified Audio Fingerprint database. This task permanently removes files that have been tagged for deletion using the AfpRemoveTrack task, and optimizes lookup functions for newly added tracks. |
AfpMatchStream | Receives audio data as a binary stream and searches it for any sections that match audio indexed in an AFP database. |
AfpMatchWav | Reads in data from an audio file, and searches it for any sections that match audio indexed in an AFP database. |
AfpRemoveTrack | Removes specified audio tracks from an AFP database. |
AmTrain | Presents training audio and transcription data to the acoustic model training process, creating accumulator files that are used by the AmTrainFinal task to produce a final adapted acoustic model. |
AmTrainFinal | Produces the adapted acoustic model, given a set of accumulator files created by the AmTrain task. |
AudioAnalysis | Runs all the audio preprocessing tasks that are supported by the audiopreproc module in a single task. |
AudioSecurity | Detects and labels segments of audio containing alarms, screams, breaking glass, or gunshots. |
ClippingDetection | Detects clipping in audio data. |
CombineFMD | Combines several phoneme time track files into a single file, which can then be used for phonetic phrase match. |
DataObfuscation | Prepares training data with any sensitive or classified information concealed. |
DialToneIdentification | Detects and identifies DTMF dial tones in audio data. |
LangIdBndLif | Reads in language identification features from file and determines boundaries in the feature sequence where the language changes. Returns the language identification results between boundaries. |
LangIdBndStream | Receives audio data as a binary stream, converts it into language identification features, and determines boundaries where the language changes. Returns the language identification results between boundaries. |
LangIdBndWav | Reads in data from an audio file, converts it into language identification features, and determines boundaries where the language changes. Returns the language identification results between boundaries. |
LangIdCumLif | Reads in language identification features from file. Returns the running language identification score at periodic intervals. This is the score for all the input data from the start to the current point. |
LangIdCumStream | Receives audio data as a binary stream and converts it into language identification features. Returns the running language identification score at periodic intervals. This is the score for all the input data from the start to the current point. |
LangIdCumWav | Reads in data from an audio file and converts it into language identification features. Returns the running language identification score at periodic intervals. This is the score for all the input data from the start to the current point. |
LangIdFeature | Converts audio files containing the relevant language into language identification feature (.lif) files, which are required for training language classifiers. |
LangIdOptimize | Optimizes the balance between language classifiers in a classifier set. |
LangIdSegLif | Reads in language identification features from file, processes the data in fixed-sized chunks, and returns the language identification results for each chunk. |
LangIdSegStream | Receives audio data as a binary stream and converts it into language identification features. IDOL Speech Server processes the data in fixed-sized chunks, and returns the language identification results for each chunk. |
LangIdSegWav | Reads in data from an audio file and converts it into language identification features. IDOL Speech Server processes the data in fixed-sized chunks, and returns the language identification results for each chunk. |
LangIdTrain | Reads in a set of language identification feature files created from audio representing a single language (using the LangIdFeature task), and uses this data to train a new language classifier. |
LanguageModelBuild | Builds a new language model from a set of text files. |
LmListVocab | Lists the most common words in the specified language model. |
LmLookUp | Verifies whether a specified word is present in the vocabulary of a particular language model and, if so, how frequently the word occurs. |
LmPerplexity | Analyzes the perplexity of a sample text file, when given a specific language model. |
Scorer | Scores a speech recognition transcript (such as that generated by the SpeechToText task), when given a reference transcript file. |
SearchFMD | Searches for specified phrases in a phoneme time track file. |
SegmentText | Inserts whitespace between words in a text file (for languages that do not separate words with whitespace). |
SegmentWav | Attempts to segment audio into sections by speaker even if no trained speakers exist in the system. |
SidPackage - Deprecated | Packages a set of trained speaker models into a single speaker classification file. |
SidTrain - Deprecated | Uses an audio file to create or update a speaker training file. |
SidTrainFinal - Deprecated | Uses a base model and one or more speaker training files to generate a speaker template file. |
SNRCalculation | Analyzes the signal-to-noise levels across an audio file. |
SpeechSilClassification | Segments audio by contents: either speech, non-speech, or music. |
SpkIdDevel | Processes speaker ID feature files to generate scores for tuning model thresholds. |
SpkIdDevelFinal |
Estimates the thresholds for a set of speaker templates. |
SpkIdDevelStream | Creates or updates a development (.atd ) file for an audio stream. |
SpkIdDevelWav | Creates or updates a development (.atd ) file for an audio file. |
SpkIdEvalStream | Analyzes an audio stream to identify any sections where the trained speakers are present. |
SpkIdEvalWav | Analyzes an audio file to identify any sections where the trained speakers are present. |
SpkIdFeature | Creates a speaker ID feature file. |
SpkIdSetAdd | Takes one or more audio template files, and adds them to an audio template set file. |
SpkIdSetDelete | Removes a template from an audio template set file. |
SpkIdSetInfo | Retrieves information on an audio template set file. |
SpkIdTrain | Uses one or more feature files to train a speaker template. |
SpkIdTrainStream | Takes an audio stream containing speech data from the speaker to be trained, and creates a new speaker template file. |
SpkIdTrainWav | Takes a single audio file containing speech data from the speaker to be trained, and creates a new speaker template file. |
StreamSidOptimize | Receives sample audio data for a trained (or untrained) speaker from a binary stream file, and updates statistics used in calculating speaker thresholds across the whole speaker classifier set. |
StreamSidTrain | Receives sample audio data for a specific speaker from a binary stream, and creates a speaker model to represent the speaker. |
StreamSpeakerId - Deprecated | Segments an audio stream by speaker and identifies known speakers, unknown speakers, and periods of non-speech within the audio. To run the StreamSpeakerID task, speakers must be trained to Speech Server. |
StreamToText | Converts live audio into a text transcript. |
TelWavToText | Transcribes a telephony audio file, including dial tones and DTMF dial tones. |
TextNorm | Takes a raw text transcription file and produces a normalized form (by removing punctuation, rewriting numbers as words, altering word cases, and so on). |
TranscriptAlign | If a transcript is available for an audio recording, the transcript alignment function can place time locations for each word in the transcript. This function is suitable for aligning subtitles to audio or video files. |
TranscriptCheck | Checks how well a text transcript matches the audio data, identifying large missing or erroneous sections. |
WavPhraseSearch | Searches for a specified phrase or phrases in an audio file. |
WavSidOptimize - Deprecated | Reads in sample audio for a trained (or untrained) speaker from an audio file, and updates statistics used in calculating speaker thresholds across the whole speaker classifier set. |
WavSidTrain - Deprecated | Reads in sample audio for a specific speaker from an audio file and creates a speaker model to represent the speaker. |
WavSpeakerId - Deprecated | Segments an audio file by speaker and identifies known speakers, unknown speakers, and periods of non-speech within the audio file. To run the WavSpeakerID task, speakers must be trained to Speech Server. |
WavToFMD | Creates a phoneme time track file from a single audio file. |
WavToPlh | Reads data from an audio file and produces an audio feature file, which is used in tasks such as Amtrain (adapts acoustic models). |
WavToText | Converts an audio file into a text transcript. |
|