After you have selected and prepared the training text files, you can build the custom language model.
To build the language model
Create a list that contains the file names (including file extensions) of all training text files. You do not have to include the file paths because you can use the DataPath
parameter to specify the directory path in the next step.
For more information about IDOL Speech Server's list manager, see Create and Manage Lists.
Send an AddTask
action to IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to LanguageModelBuild . |
DataList
|
The list that specifies the training text files. |
DataPath
|
The path to the directory that contains the files specified in the DataList parameter. |
KeepList
|
The path to a file that contains a list of words that the language model must contain. For more information on the format of the file, see the IDOL Speech Server Reference. |
Lang
|
The language pack to use as a base (for example, ENUK-tel ). |
NewLanguageModel
|
The name to give the custom language model that is generated. |
NewDictionary
|
The name of the dictionary to generate; usually it is the same value as NewLanguageModel . |
DoSmoothing
|
If you are using a custom language model for a transcript alignment task, set DoSmoothing to False . Otherwise, you can use the default value of True . |
If the training text files contain Japanese, Korean, Mandarin, or Taiwanese Mandarin languages, set the DoSegment
parameter.
DoSegment
|
Set to True to enable text segmentation. |
For example:
http://localhost:13000/action=AddTask&Type=LanguageModelBuild&KeepList=ListManager/KeepWordsList.txt&DataList=ListManager/Langmodel&DataPath=C:\LanguageModelFiles&Lang=ENUK-tel&NewLanguageModel=mymodel.tlm&NewDictionary=mymodel.dct.sz
This action uses port 13000
to instruct IDOL Speech Server, which is located on the local machine, to use the training text specified in the Langmodel
list and the ENUK-tel
language pack to build a new language model and dictionary file, both named mymodel
. The language model that the task produces must contain the words in the KeepWordsList
text file. This action also calculates a recommended interpolation weight at the end of the language model building process.
Note: The interpolation weight is only a suggested weight–you can choose to set other weights.
The new language models are placed in the custom language models folder that is specified by the CustomLmDir
parameter in the IDOL Speech Server configuration file.
This action returns a token. You can use the token to:
GetResults
action to retrieve the recommended interpolation weight for the custom language model. See Find Recommended Language Resources to Use in a Task.
|