Introduction

Indexing is the process by which the set of documents is analyzed and the data entered into the database. Recoll indexing is normally incremental: documents will only be processed if they have been modified since the last run. On the first execution, all documents will need processing. A full index build can be forced later by specifying an option to the indexing command (recollindex -z or -Z).

recollindex skips files which caused an error during a previous pass. This is a performance optimization, and a new behaviour in version 1.21 (failed files were always retried by previous versions). The command line option -k can be set to retry failed files, for example after updating a filter.

The following sections give an overview of different aspects of the indexing processes and configuration, with links to detailed sections.

Depending on your data, temporary files may be needed during indexing, some of them possibly quite big. You can use the RECOLL_TMPDIR or TMPDIR environment variables to determine where they are created (the default is to use /tmp). Using TMPDIR has the nice property that it may also be taken into account by auxiliary commands executed by recollindex.