The default location for the index data is the
xapiandb
subdirectory of the Recoll
configuration directory, typically
$HOME/.recoll/xapiandb/
. This can be
changed via two different methods (with different purposes):
You can specify a different configuration directory by setting the
RECOLL_CONFDIR
environment variable, or using the-c
option to the Recoll commands. This method would typically be used to index different areas of the file system to different indexes. For example, if you were to issue the following command:recoll -c ~/.indexes-email
Then Recoll would use configuration files stored in
~/.indexes-email/
and, (unless specified otherwise inrecoll.conf
) would look for the index in~/.indexes-email/xapiandb/
.Using multiple configuration directories and configuration options allows you to tailor multiple configurations and indexes to handle whatever subset of the available data you wish to make searchable.
For a given configuration directory, you can specify a non-default storage location for the index by setting the
dbdir
parameter in the configuration file (see the configuration section). This method would mainly be of use if you wanted to keep the configuration directory in its default location, but desired another location for the index, typically out of disk occupation concerns.
The size of the index is determined by the size of the set of documents, but the ratio can vary a lot. For a typical mixed set of documents, the index size will often be close to the data set size. In specific cases (a set of compressed mbox files for example), the index can become much bigger than the documents. It may also be much smaller if the documents contain a lot of images or other non-indexed data (an extreme example being a set of mp3 files where only the tags would be indexed).
Of course, images, sound and video do not increase the index size, which means that nowadays (2012), typically, even a big index will be negligible against the total amount of data on the computer.
The index data directory (xapiandb
)
only contains data that can be completely rebuilt by an index run
(as long as the original documents exist), and it can always be
destroyed safely.