Indexing is the process by which the set of documents is
analyzed and the data entered into the database. Recoll
indexing is normally incremental: documents will only be
processed if they have been modified since the last run. On
the first execution, all documents will need processing. A
full index build can be forced later by specifying an option
to the indexing command (recollindex
-z
or -Z
).
recollindex skips files which caused an
error during a previous pass. This is a performance
optimization, and a new behaviour in version 1.21 (failed files
were always retried by previous versions). The command line
option -k
can be set to retry failed files, for
example after updating a filter.
The following sections give an overview of different aspects of the indexing processes and configuration, with links to detailed sections.
Depending on your data, temporary files may be needed during
indexing, some of them possibly quite big. You can use the
RECOLL_TMPDIR
or TMPDIR
environment
variables to determine where they are created (the default is to
use /tmp
). Using TMPDIR
has
the nice property that it may also be taken into account by
auxiliary commands executed by recollindex.