Standard Tasks

This table describes the standard tasks that are already defined in the tasks configuration file. You can run any of these tasks straight out of the box.

Task Description
AfpAddTrackStream Adds a new audio track to an audio fingerprint database, receiving the audio data as a stream, and converting to AFP features before indexing.
AfpAddTrackWav Adds a new audio track to an audio fingerprint database, reading the data from an audio file, and converting to AFP features before indexing.
AfpDatabaseInfo Returns a list of all tracks that are currently stored in the specified database.
AfpDatabaseOptimize Optimizes the internal indexing of the specified database. This task permanently removes files that have been tagged for deletion using the AfpRemoveTrack task, and optimizes lookup functions for newly added tracks.
AfpMatchStream Receives audio data as a binary stream, and searches it for any indexed audio sections.
AfpMatchWav Reads in data from an audio file, and searches it for any indexed audio sections.
AfpRemoveTrack Removes specified tracks from an audio fingerprint database.
AmTrain Presents training audio and transcription data to the acoustic model training process, and creates accumulator files that are used to produce a final adapted acoustic model.
AmTrainFinal Produces the adapted acoustic model, given a set of accumulator files created by the AmTrain task.
AudioAnalysis Runs all the audio preprocessing tasks that are supported by the audiopreproc module in a single task.
AudioSecurity Detects and labels segments of audio that contain alarms, screams, breaking glass, or gunshots.
ClippingDetection Analyzes an audio file for the issue of audio clipping.
CombineFMD Combines several phoneme track files, which can then be used for phrase search.
DataObfuscation Prepares training data with any sensitive or classified information concealed.
DialToneIdentification Detects and identifies dial tones in audio.
LangIDBndLif Reads in language ID features from file, and determines boundaries in the feature sequence where the language changes. Returns the language identification results between boundaries.
LangIdBndStream Receives audio data as a binary stream, converts the audio into language ID features, and determines boundaries where the language changes. Returns the language identification results between boundaries.
LangIdBndWav Reads in data from an audio file, converts it into language ID features, and determines boundaries where the language changes. Returns the language identification results between boundaries.
LangIdCumLif Reads in language ID features from file, and returns the running language identification score at periodic intervals (that is, the score for all the input data from the start to the current point).
LangIdCumStream Receives audio data as a binary stream, and converts it into language ID features. Returns the running language identification score at periodic intervals (that is, the score for all the input data from the start to the current point).
LangIdCumWav Reads in data from an audio file, and converts it into language ID features. Returns the running language identification score at periodic intervals (that is, the score for all the input data from the start to the current point).
LangIdFeature Converts audio files in the relevant language into language identification feature (.lif) files, which are required for training classifiers.
LangIdOptimize Optimizes the balance between language classifiers. After training, some classifiers might be stronger than others because of properties of the training material and the languages in question. The optimization process weights the language models so that weaker languages have increased accuracy, without compromising accuracy for stronger language models. This process improves consistent performance.
LangIdSegLif Reads in language ID features from file, processes the data in fixed-sized chunks, and returns the language identification results for each chunk.
LangIdSegStream Receives audio data as a binary stream, and converts it into language ID features. Processes the data in fixed-sized chunks, and returns the language identification results for each chunk.
LangIdSegWav Reads in data from an audio file and converts it into language ID features. Processes the data in fixed-sized chunks and returns the language identification results for each chunk.
LangIdTrain Reads in a set of language ID feature files created from audio representing a single language (using the LangIdFeature task), and uses this data to train a new language classifier.
LanguageModelBuild Builds a new language model from a set of text files.
LMListVocab Lists the most common words in the specified language model.
LMLookUp Verifies whether a specified word is present in the vocabulary of a particular language model, and, if it is present, how frequently it occurs.
LMPerplexity Analyzes the perplexity of a sample text file, when given a specific language model.
Scorer Scores the recognition transcript (such as that generated by the SpeechToText task), when given a reference transcript file.
SearchFMD Searches a phoneme track file for one or more specified phrases.
SegmentText Inserts whitespace between words in a text file (for languages that do not separate words with whitespace).
SegmentWav Attempts to segment audio into sections by speaker even if no trained speakers exist in the system.
SidPackage

Packages a set of trained speaker models into a single speaker classification file.

Deprecated: The SidPackage task is deprecated for IDOL Server version 11.0.0. HPE recommends that you use the SpkIdDevelFinal task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The parameter might be deleted in future.

SidTrain

Takes an audio file and a base model (by default, the USM model), and writes a speaker training parameter (SPT) file.

Deprecated: The SidTrain task is deprecated for IDOL Server version 11.0.0. HPE recommends that you use the SpkIdTrain task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The parameter might be deleted in future.

SidTrainFinal

Takes one or more SPT files and the base model, and produces a new speaker model.

Deprecated: The SidTrainFinal task is deprecated for IDOL Server version 11.0.0. HPE recommends that you use the SpkIdTrain task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The parameter might be deleted in future.

SNRCalculation Calculates SNR levels across an audio file.
SpeechSilClassification Segments an audio file into sections of speech, non-speech, and music.
SpkIdDevel Processes speaker ID feature files to generate scores for tuning model thresholds.
SpkIdDevelFinal

Estimates the thresholds for a set of speaker templates.

SpkIdDevelStream Creates or updates a development (.atd) file for an audio stream.
SpkIdDevelWav Creates or updates a development (.atd) file for an audio file.
SpkIdEvalStream Analyzes an audio stream to identify any sections where the trained speakers are present.
SpkIdEvalWav Analyzes an audio file to identify any sections where the trained speakers are present.
SpkIdFeature Creates a speaker ID feature file.
SpkIdSetAdd Takes one or more audio template files, and adds them to an audio template set file.
SpkIdSetDelete Removes a template from an audio template set file.
SpkIdSetInfo Retrieves information on an audio template set file.
SpkIdTrain Uses one or more feature files to train a speaker template.
SpkIdTrainStream Takes an audio stream containing speech data from the speaker to be trained, and creates a new speaker template file.
SpkIdTrainWav Takes a single audio file containing speech data from the speaker to be trained, and creates a new speaker template file.
StreamSidOptimize Receives sample audio data for a trained (or untrained) speaker from a binary stream file, and updates statistics used to calculate speaker thresholds across the whole speaker classifier set.
StreamSidTrain Receives sample audio data for a specific speaker from a binary stream, and creates a speaker model to represent the speaker.
StreamSpeakerId

Segments an audio stream by speaker and identifies known speakers, unknown speakers, and periods of non-speech within the audio. To run the StreamSpeakerID task, speakers must be trained to Speech Server.

Deprecated: The StreamSpeakerId task is deprecated for IDOL Server version 11.0.0. HPE recommends that you use the spkIdEvalStream task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The parameter might be deleted in future.

StreamToText Converts live audio into a text transcript.
StreamToTextMusicFilter Converts live audio into a text transcript and categorizes the audio so that you can remove any sections consisting of music or noise.
TelWavToText Transcribes a telephony audio file, including dial tones and DTMF dial tones.
TextNorm Takes a raw text transcription file and produces a normalized form (by removing punctuation, rewriting numbers as words, altering word cases, and so on).
TranscriptAlign If a transcript is available for an audio recording, the transcript alignment function can place time locations for each word in the transcript. You can use this function to align subtitles with audio or video files.
TranscriptCheck Checks how well a text transcript matches the audio data, and identifies large missing or erroneous sections.
WavPhraseSearch Searches for a specified phrase or phrases in an audio file.
WavSidOptimize

Reads in sample audio for a trained (or untrained) speaker from an audio file, and updates statistics used to calculate speaker thresholds across the whole speaker classifier set.

Deprecated: The WavSidOptimize task is deprecated for IDOL Server version 11.0.0. HPE recommends that you use the SpkIdDevelWav task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The parameter might be deleted in future.

WavSidTrain

Reads in sample audio for a specific speaker from an audio file, and creates a speaker model to represent the speaker.

Deprecated: The WavSidTrain task is deprecated for IDOL Server version 11.0.0. HPE recommends that you use the SpkIdTrainWav task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The parameter might be deleted in future.

WavSpeakerId

Segments audio by speaker and identifies known speakers, unknown speakers, and periods of non-speech within the audio file. To run the WavSpeakerID task, speakers must be trained to Speech Server.

Deprecated: The WavSpeakerId task is deprecated for IDOL Server version 11.0.0. HPE recommends that you use the SpkIdEvalWav task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The parameter might be deleted in future.

WavToFMD Creates a phoneme time track (.fmd) file from a single audio file.
WavToPlh Reads data from an audio file and produces an audio feature (.plh) file, such as those used in the acoustic model adaptation process (the AmTrain task).
WavToText

Converts an audio file into a text transcript.

Note: To use WavToText to submit audio data as a binary data block for speech-to-text, submit the task data without specifying a .wav file.

For details about each task, including the required action and configuration parameters, see the IDOL Speech Server Reference.


_HP_HTML5_bannerTitle.htm