You can override the default stemming rules for certain words in a particular language by creating a language-specific stemming file. This file is a list of words and their stems. If a stemming file exists, IDOL Server uses it to stem the terms that it contains. Terms that are not in the file stem according to the default stemming rules.
HPE recommends use of a stemming file only for unusual or specialized terms where the default rules do not generate a stem. A stemming file is not intended to be a complete replacement for the IDOL stemming algorithms.
Create a text file.
Format the file as a stop word list. The first line is an encoding designation. Subsequent lines contain individual word pairs; a term followed by its stem. For example:
[UTF8] mice mouse mouse mouse children child
Note: To ensure that two words stem to the same value, you must add both words to the stemming file, with the appropriate stem.
Save the file with a name of your choice (for example, english_stem.dat
) in the directory installDir/common/langfiles
.
Open the IDOL Server configuration file. In the [MyLanguage]
section for the stemming file language, set the StemmingFile
configuration parameter to the name of your stemming file. For example:
[english] Encodings=UTF8:englishUTF8 Stoplist=engish.dat StemmingFile=english_stem.dat
Ensure that this [MyLanguage]
section does not set Stemming
to False
. The default value for Stemming
in a language is True
.
If you disable stemming for a language, but provide a stemming file, IDOL Server stems terms in the file, but does not stem other terms.
|