Content Component
Content Component is an ACI server. For details of changes that affect all ACI servers, see ACI Server Framework.
23.4.0
New in this Release
-
Content now allows you to collect basic statistics about the distribution and data content of your document fields, such as:
-
The total number of individual occurrences of each field, and the number of distinct documents each appears in.
-
How many distinct values are observed for each field.
-
Occurrence counts for values that might be parsed as numeric, date, or geographic values.
-
Distribution (minimum, maximum, and mean) of numeric and date values, and value lengths, for each field.
You can configure statistics collection by setting the new
CollectFieldStatistics
configuration parameter to in the[Server]
section.For an existing index, you can also use the new
RegenerateFieldStatsIndex
configuration parameter to generate field statistics at startup.When you enable field statistics:
-
you can retrieve statistics for each field by using the
GetTagNames
action with the newFieldStats
parameter set to True.
-
the
Suggest
action uses structural information from the source documents as well as the unstructured information to find relevant documents. -
the
TermGetBest
action can return information about the occurrences of structured field and value pairs, when you set the newFieldStats
parameter to True. -
The
GetQueryTagValues
can return total occurrences for non-parametric fields (that is when you set bothAllowNonParametricFields
andDocumentCount
to True).
-
-
You can now index vector values into Content, and use these for queries. A vector in a document is a comma-separated list of floating point values. You can generate vectors by using many different models. Content can then use these vectors to find documents that are similar to a vector value that you use in the new
VECTOR
operator in aQuery
action, or to performSuggest
queries.To configure Content to process and use vector values, you must use the new
VectorType
field property for the field that contains the vector values. You can update an existing index to use these vector values by setting theRegenerateVectorIndex
configuration parameter, or by using theDREREGENERATE
index action.You can configure the method to use to determine how close vectors are to one another by setting the
DistanceMetric
parameter in the[VectorIndex]
configuration section. You can also change the directory that Content uses to store the vector index files by setting theVectorPath
parameter in the[Paths]
section.For more information, refer to the IDOL Content Component Help.
-
The spellcheck phase of a query now respects timeouts.
-
Indexing performance has been improved when sending documents to Content in small batches.
-
The mapped security library has been updated. The security type
AUTONOMY_SECURITY_V4_TRIM_CONTEXT_EXT_MAPPED
(for Content Manager) now supports exclusions. -
Performance has been improved for cases where a several index actions were issued sequentially with a pause ot wait for each to complete before sending the next (for example, from a script or application that polls for a finished status between running each action).
Resolved Issues
-
When requesting value details from a numeric field (with the
GetQueryTagValues
action andValueDetails
set to True), results were sometimes missing from multi-section documents. -
When Content was archiving index actions, and the index log stream was configured to report messages at Full log level, sending an index action with the
NoArchive
parameter set to True could cause an unexpected interruption of service. -
Geospatial queries could time out when the
XMLFullStructure
configuration parameter was set to Trueand there were a large number of geospatial fields (more than approximately 10000).
23.3.0
New in this Release
-
The handling of reasons has been improved to merge overlapping reasons. For example, the query text
James Watt" DNEAR Jr
previously gave the reasons James Watt and Watt Jr. It now returns the single reason James Watt Jr. -
The efficiency of suggesting spelling corrections has been improved. This change gives particular improvements when
UnstemmedMinDocOccs
is configured to a value less than the currentSpellCheckCorrectMinDocOccs
setting. -
Several updates and improvements have been made to the BIAS FieldText operators:
-
The new
BIASRANGE
operator has been added. This operator allows you to bias the score of results that fall within a particular date range. It also allows you to reduce the score bias for values within a specified range outside this optimum range. For example:BIASRANGE{21/08/2011,25/08/2011,172800,86400,10}:DATE
This example boosts the score by 10% for documents with a DATE field value in the range 21/08/2011 to 25/08/2011 (inclusive). It gives a smaller boost (on a linear scale) for documents within 172800s (two days) before 21/08/2011, and 86400s (one day) after.
-
The new
BIASNRANGE
operator has been added. this operator allows you to bias the score of results that contain a value within a specified range in a specified field, and to reduce the score bias linearly for values within a specified range outside this optimum range. For example:FieldText=BIASNRANGE{100,150,20,40,10}:*/PRICE
A document whose
PRICE
field value is between 100 and 150 has its weight increased by 10%. This boost decreases linearly to 0% at 80 and lower, and 190 and higher. -
The
BIASVAL
operator now supports an empty value for its first argument. For example,BIASVAL{,10}:COLOUR
applies a score boost to any result document that does not have aCOLOUR
field, or has aCOLOUR
field with an empty value.NOTE:
BIASVAL
still requires two arguments, soBIASVAL{10}:COLOUR
is not valid syntax. -
You can now use all BIAS field specifiers in
FieldTextField
fields for use with AgentBoolean queries (that is,BIAS
,BIASDATE
,BIASDISTCARTESIAN
,BIASDISTSPHERICAL
,BIASVAL
,BIASRANGE
, andBIASNRANGE
are now supported for AgentBoolean queries).
-
-
You can now use an open-ended range in the
NRANGE
field operator by setting one of the values to a period (.
). For exampleNRANGE{.,5}:NUM
means that theNUM
field must contain a value of 5 or less. -
The
GetQueryTagValues
value response whenDocumentCount
is set to True now includes the total number of occurrences for each value in the server.
Resolved Issues
-
When used in conjunction with the
WHEN
operator in XML full-structure mode, theTERM
andTERMEXACT
FieldText specifiers failed to return some documents that should have matched. -
The indexer thread could be blocked for an extended time when attempting to delete a file, if the target had been removed in the meantime by an external process.
-
When rebuilding the unstemmed index with
RegenerateUnstemmedIndex
, numeric/alphanumeric terms were sometimes excluded, regardless of the configuredIndexNumbers
value. -
The Content component NiFi processor,
ContentServiceImpl
was unable to obtain a license correctly.
23.2.0
New in this Release
-
Loading has been optimized for
ACLType
fields that have also been configured asMemcachedType
(seeNodetableCacheFields
).NOTE: This change is only relevant to security models where the DLL load is required for evaluation.
-
The
QueryCacheMaxMemKB
configuration parameter has been added to the[Security]
section. Set this parameter to a value in KB to enable a per-query cache that speeds up security checks for cases where there are many non-unique ACLs in the system (for example, where security is inherited from a top level folder). If the same ACL has already been evaluated during the query, Content does not need to call the security DLL again. You can setQueryCacheMaxMemKB
to -1 for an unlimited cache size, or 0 to disable the cache.NOTE: This change is only relevant to security models where the DLL load is required for evaluation (for example, there is no need to use this parameter with NT_V4 security).
Resolved Issues
-
In some cases, Content failed to return hits for terms that existed only in the index cache and not in any indexed documents when
SearchUncommittedDocuments
was set to True. -
Content could spuriously log an error
"Dynterm list is NULL for term"
. This error tended to happen for terms with a large number (millions) of occurrences, in servers where documents were regularly deleted and the index compacted. -
When the Active Directory contained a group name that ends with a space character, the Content security index could become invalid after the component was restarted.
-
When the saved best terms cache file was non-valid, the Content application could shut down during a
DRECOMPACT
operation. Content now automatically rebuilds the cache if it cannot load the saved file.