Parameters affecting how we generate terms:

Changing some of these parameters will imply a full reindex. Also, when using multiple indexes, it may not make sense to search indexes that don't share the values for these parameters, because they usually affect both search and index operations.

indexStripChars

Decide if we strip characters of diacritics and convert them to lower-case before terms are indexed. If we don't, searches sensitive to case and diacritics can be performed, but the index will be bigger, and some marginal weirdness may sometimes occur. The default is a stripped index (indexStripChars = 1) for now. When using multiple indexes for a search, this parameter must be defined identically for all. Changing the value implies an index reset.

maxTermExpand

Maximum expansion count for a single term (e.g.: when using wildcards). The default of 10000 is reasonable and will avoid queries that appear frozen while the engine is walking the term list.

maxXapianClauses

Maximum number of elementary clauses we can add to a single Xapian query. In some cases, the result of term expansion can be multiplicative, and we want to avoid using excessive memory. The default of 100 000 should be both high enough in most cases and compatible with current typical hardware configurations.

nonumbers

If this set to true, no terms will be generated for numbers. For example "123", "1.5e6", 192.168.1.4, would not be indexed ("value123" would still be). Numbers are often quite interesting to search for, and this should probably not be set except for special situations, ie, scientific documents with huge amounts of numbers in them. This can only be set for a whole index, not for a subtree.

dehyphenate

Determines if, given an input of co-worker, we add a term for coworker. This possibility is new in version 1.22, and on by default. Setting the variable to off allows restoring the previous behaviour.

nocjk

If this set to true, specific east asian (Chinese Korean Japanese) characters/word splitting is turned off. This will save a small amount of cpu if you have no CJK documents. If your document base does include such text but you are not interested in searching it, setting nocjk may be a significant time and space saver.

cjkngramlen

This lets you adjust the size of n-grams used for indexing CJK text. The default value of 2 is probably appropriate in most cases. A value of 3 would allow more precision and efficiency on longer words, but the index will be approximately twice as large.

indexstemminglanguages

A list of languages for which the stem expansion databases will be built. See recollindex(1) or use the recollindex -l command for possible values. You can add a stem expansion database for a different language by using recollindex -s, but it will be deleted during the next indexing. Only languages listed in the configuration file are permanent.

defaultcharset

The name of the character set used for files that do not contain a character set definition (ie: plain text files). This can be redefined for any sub-directory. If it is not set at all, the character set used is the one defined by the nls environment ( LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.

unac_except_trans

This is a list of characters, encoded in UTF-8, which should be handled specially when converting text to unaccented lowercase. For example, in Swedish, the letter a with diaeresis has full alphabet citizenship and should not be turned into an a. Each element in the space-separated list has the special character as first element and the translation following. The handling of both the lowercase and upper-case versions of a character should be specified, as appartenance to the list will turn-off both standard accent and case processing. Example for Swedish:

unac_except_trans =  åå Åå ää Ää öö Öö
            

Note that the translation is not limited to a single character, you could very well have something like üue in the list.

The default value set for unac_except_trans can't be listed here because I have trouble with SGML and UTF-8, but it only contains ligature decompositions: german ss, oe, ae, fi, fl.

This parameter can't be defined for subdirectories, it is global, because there is no way to do otherwise when querying. If you have document sets which would need different values, you will have to index and query them separately.

maildefcharset

This can be used to define the default character set specifically for email messages which don't specify it. This is mainly useful for readpst (libpst) dumps, which are utf-8 but do not say so.

localfields

This allows setting fields for all documents under a given directory. Typical usage would be to set an "rclaptg" field, to be used in mimeview to select a specific viewer. If several fields are to be set, they should be separated with a semi-colon (';') character, which there is currently no way to escape. Also note the initial semi-colon. Example: localfields= ;rclaptg=gnus;other = val, then select specifier viewer with mimetype|tag=... in mimeview.

testmodifusemtime

If true, use mtime instead of default ctime to determine if a file has been modified (in addition to size, which is always used). Setting this can reduce re-indexing on systems where extended attributes are modified (by some other application), but not indexed (changing extended attributes only affects ctime). Notes:

  • This may prevent detection of change in some marginal file rename cases (the target would need to have the same size and mtime).

  • You should probably also set noxattrfields to 1 in this case, except if you still prefer to perform xattr indexing, for example if the local file update pattern makes it of value (as in general, there is a risk for pure extended attributes updates without file modification to go undetected).

Perform a full index reset after changing the value of this parameter.

noxattrfields

Recoll versions 1.19 and later automatically translate file extended attributes into document fields (to be processed according to the parameters from the fields file). Setting this variable to 1 will disable the behaviour.

metadatacmds

This allows executing external commands for each file and storing the output in Recoll document fields. This could be used for example to index external tag data. The value is a list of field names and commands, don't forget an initial semi-colon. Example:

[/some/area/of/the/fs]
metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
                

As a specially disgusting hack brought by Recoll 1.19.7, if a "field name" begins with rclmulti, the data returned by the command is expected to contain multiple field values, in configuration file format. This allows setting several fields by executing a single command. Example:

metadatacmds = ; rclmulti1 = somecmd %f
                

If somecmd returns data in the form of:

field1 = value1
field2 = value for field2
                

field1 and field2 will be set inside the document metadata.