Running indexing

Indexing is always performed by the recollindex program, which can be started either from the command line or from the File menu in the recoll GUI program. When started from the GUI, the indexing will run on the same configuration recoll was started on. When started from the command line, recollindex will use the RECOLL_CONFDIR variable or accept a -c confdir option to specify a non-default configuration directory.

If the recoll program finds no index when it starts, it will automatically start indexing (except if canceled).

The recollindex indexing process can be interrupted by sending an interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may elapse before the process exits, because it needs to properly flush and close the index. This can also be done from the recoll GUI FileStop Indexing menu entry.

After such an interruption, the index will be somewhat inconsistent because some operations which are normally performed at the end of the indexing pass will have been skipped (for example, the stemming and spelling databases will be inexistant or out of date). You just need to restart indexing at a later time to restore consistency. The indexing will restart at the interruption point (the full file tree will be traversed, but files that were indexed up to the interruption and for which the index is still up to date will not need to be reindexed).

recollindex has a number of other options which are described in its man page. Only a few will be described here.

Option -z will reset the index when starting. This is almost the same as destroying the index files (the nuance is that the Xapian format version will not be changed).

Option -Z will force the update of all documents without resetting the index first. This will not have the "clean start" aspect of -z, but the advantage is that the index will remain available for querying while it is rebuilt, which can be a significant advantage if it is very big (some installations need days for a full index rebuild).

Option -k will force retrying files which previously failed to be indexed, for example because of a missing helper program.

Of special interest also, maybe, are the -i and -f options. -i allows indexing an explicit list of files (given as command line parameters or read on stdin). -f tells recollindex to ignore file selection parameters from the configuration. Together, these options allow building a custom file selection process for some area of the file system, by adding the top directory to the skippedPaths list and using an appropriate file selection method to build the file list to be fed to recollindex -if. Trivial example:

	    find . -name indexable.txt -print | recollindex -if
	  

recollindex -i will not descend into subdirectories specified as parameters, but just add them as index entries. It is up to the external file selection method to build the complete file list.