Use Your Content > Improve > Image Analysis > Optical Character Recognition (OCR) > Configure OCR

Configure OCR

OCR has many configuration options that allow you to fine-tune its operation to improve accuracy. This section describes the basic settings that you need to consider before running OCR.

The parts of an image that are likely to be text depends on the context. To reflect this, Image Server has three different modes:

You specify the mode using the OCRMode parameter in the configuration file.

Image Server supports two types of subtitle. By default, Image Server searches for single color text against a plain, single color background. You can also configure Image Server to search for black-bordered white letters that have been superimposed directly onto the background TV image, which is a widely used type of subtitling. The Image Server configuration file refers to this type of subtitle as 'hollow text'.

You must specify all the languages that you expect the text to be in using the Languages parameter. Image Server restricts its identification attempts to characters that are used by the specified languages. You can add extra characters to this character list (for example, rarer punctuation) using the WhiteList parameter. You can also further restrict the possible character choices, for example to a single case or to digits only, using the CharacterTypes parameter. In many cases, you know in advance that only a limited subset of characters will occur in the images (for clarity, many forms use only upper case, digits, and limited punctuation). In this situation, reducing the list of characters that Image Server considers improves accuracy and speed.


_HP_HTML5_bannerTitle.htm