== Using the log file to investigate indexing issues
All *Recoll* processes print trace messages. By default these go to the
standard error output, and you may not ever see them (in the case, for
example, of the *recoll* GUI started from the desktop interface).
There are a number of potential issues with indexing that may need
investigation, such as:
- A file can't be found by searching even if it appears that it should have
be indexed (this could happen because the file is not selected at all or
because a filter program crashes).
- The indexing process gets stuck and never finishes.
- The indexing process ends up with an error.
- The indexing process seems to be using too much system capacity.
The right way to approach these problems is to use the *recollindex*
command line tool (instead of the *recoll* GUI), and to set up the
trace log to provide information about what indexing is actually doing.
Trace log parameters can be set either from the GUI _Preferences->Indexing
Configuration->Global Parameters_ panel, or by editing the configuration
file '~/.recoll/recoll.conf'. You should set the following parameters:
----
loglevel = 6
logfilename = stderr
thrQSizes = -1 -1 -1
----
We use _stderr_ instead of an actual file in order to capture direct filter
messages (such as a *python* stack trace) along with normal
*recollindex* messages.
The last line sets recollindex for single-threaded operation, which will
make the log much more readable.
You should then check that no *recoll* or *recollindex* process is
currently running, and kill any you find.
Then, if this is an issue about an identified file, try indexing it only:
----
recollindex -i myunfindablefile.xxx > /tmp/myindexlog 2>&1
----
If this is a general issue with indexing (process not finishing properly),
just start it:
----
recollindex > /tmp/myindexlog 2>&1
----
Usually, having a look at the trace will allow to see what is wrong (e.g.:
a configuration issue or missing filter), and solve the problem.
In case of indexer misbehaviour (e.g. using too much memory, you should run
_tail -f_ on the log to see what is going on.
If this is not enough, please
link:http://bitbucket.org/medoc/recoll/issues/new[open a tracker issue] and
attach or link to the log data, or just email me (jfd at recoll.org).
*recollindex* and *recollindex -i* usually have the same criteria to
include a file or not (but see the _Path gotcha_ note below). It may
happen that they behave differently, so it may sometimes be useful to run a
full *recollindex* even for a specific file, but this will produce a
big log file.
When you are done, it is better to reset the verbosity to a reasonable
level (e.g.: +2+ : just errors, +3+ : information, listing indexed files).
=== Note: the path gotcha
*recollindex -i* will only index files under the directories defined by the
+topdirs+ configuration variable (your home directory by
default). Unfortunately, the test is done on the file path text, ignoring
possible symbolic links. If you give a simple file name as a parameter to
*recollindex -i* and there are symbolic links inside the +topdirs+
entries, the comparison may fail. For example, if your home directory is
'/home/me/' and '/home/' is a link to '/usr/home/', *recollindex -i
somefilename* will actually try to index '/usr/home/somefilename/', and
fail (because '/usr/home/me/' is not a subdirectory of '/home/me/'). This
will manifest itself in the log by a message like the following.
----
:4:../index/fsindexer.cpp:149:FsIndexer::indexFiles: skipping [/usr/home/me/somefile] (ntd)
----
If this happens, give a full path consistent with what is found in the
configuration file (e.g.: _recollindex -i /home/me/somefile_).
=== File system occupation
One of the possible reasons for failed indexing is a +maxfsoccup+
parameter set too low. This is the value of file system occupation, not
free space, where indexing will stop. It is set from the GUI indexing
configuration or by editing 'recoll.conf'. A value of 0 implies no
checking, but a very low, non-zero, value will just prevent indexing.