recoll / Code / Diff of /src/INSTALL

Diff of /src/INSTALL [8761a1] .. [2b80c7]

Switch to side-by-side view

--- a/src/INSTALL
+++ b/src/INSTALL
@@ -2,3 +2,1070 @@
 More documentation can be found in the doc/ directory or at http://www.recoll.org
 
 
+   Link: home: Recoll user manual
+   Link: up: Recoll user manual
+   Link: prev: 4.3. API
+   Link: next: 5.2. Supporting packages
+
+                   Chapter 5. Installation and configuration
+   Prev                                                                  Next 
+
+     ----------------------------------------------------------------------
+
+Chapter 5. Installation and configuration
+
+5.1. Installing a binary copy
+
+   There are three types of binary Recoll installations:
+
+     o Through your system normal software distribution framework (ie,
+       Debian/Ubuntu apt, FreeBSD ports, etc.).
+
+     o From a package downloaded from the Recoll web site.
+
+     o From a prebuilt tree downloaded from the Recoll web site.
+
+   In all cases, the strict software dependancies (ie on Xapian or iconv)
+   will be automatically satisfied, you should not have to worry about them.
+
+   You will only have to check or install supporting applications for the
+   file types that you want to index beyond those that are natively processed
+   by Recoll (text, HTML, email files, and a few others).
+
+   You should also maybe have a look at the configuration section (but this
+   may not be necessary for a quick test with default parameters). Most
+   parameters can be more conveniently set from the GUI interface.
+
+  5.1.1. Installing through a package system
+
+   If you use a BSD-type port system or a prebuilt package (DEB, RPM,
+   manually or through the system software configuration utility), just
+   follow the usual procedure for your system.
+
+  5.1.2. Installing a prebuilt Recoll
+
+   The unpackaged binary versions on the Recoll web site are just compressed
+   tar files of a build tree, where only the useful parts were kept
+   (executables and sample configuration).
+
+   The executable binary files are built with a static link to libxapian and
+   libiconv, to make installation easier (no dependencies).
+
+   After extracting the tar file, you can proceed with installation as if you
+   had built the package from source (that is, just type make install). The
+   binary trees are built for installation to /usr/local.
+
+     ----------------------------------------------------------------------
+
+   Prev                                                                  Next 
+   4.3. API                           Home           5.2. Supporting packages 
+   Link: home: Recoll user manual
+   Link: up: Chapter 5. Installation and configuration
+   Link: prev: Chapter 5. Installation and configuration
+   Link: next: 5.3. Building from source
+
+                            5.2. Supporting packages
+   Prev            Chapter 5. Installation and configuration             Next 
+
+     ----------------------------------------------------------------------
+
+5.2. Supporting packages
+
+   Recoll uses external applications to index some file types. You need to
+   install them for the file types that you wish to have indexed (these are
+   run-time optional dependencies. None is needed for building or running
+   Recoll except for indexing their specific file type).
+
+   After an indexing pass, the commands that were found missing can be
+   displayed from the recoll File menu. The list is stored in the missing
+   text file inside the configuration directory.
+
+   A list of common file types which need external commands follows. Many of
+   the filters need the iconv command, which is not always listed as a
+   dependancy.
+
+   Please note that, due to the relatively dynamic nature of this
+   information, the most up to date version is now kept on the Recoll helper
+   applications page along with links to the home pages or best
+   source/patches pages, and misc tips. The list below is not updated often
+   and may be quite stale.
+
+   For many Linux distributions, most of the commands listed can be installed
+   from the package repositories. However, the packages are sometimes
+   outdated, or not the best version for Recoll, so you should take a look at
+   the Recoll helper applications page if a file type is important to you.
+
+   As of Recoll release 1.14, a number of XML-based formats that were handled
+   by ad hoc filter code now use the xsltproc command, which usually comes
+   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
+
+   Now for the list:
+
+     o Openoffice files need unzip and xsltproc.
+
+     o PDF files need pdftotext which is part of the Xpdf or Poppler
+       packages.
+
+     o Postscript files need pstotext. The original version has an issue with
+       shell character in file names, which is corrected in recent packages.
+       See the the Recoll helper applications page for more detail.
+
+     o MS Word needs antiword. It is also useful to have wvWare installed as
+       it may be be used as a fallback for some files which antiword does not
+       handle.
+
+     o MS Excel and PowerPoint need catdoc.
+
+     o MS Open XML (docx) needs xsltproc.
+
+     o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
+       Ubuntu) package.
+
+     o RTF files need unrtf, which, in its standard version, has much trouble
+       with non-western character sets. Check the Recoll helper applications
+       page.
+
+     o TeX files need untex or detex. Check the Recoll helper applications
+       page for sources if it's not packaged for your distribution.
+
+     o dvi files need dvips.
+
+     o djvu files need djvutxt and djvused from the DjVuLibre package.
+
+     o Audio files: Recoll releases before 1.13 used the id3info command from
+       the id3lib package to extract mp3 tag information, metaflac (standard
+       flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
+       Releases 1.14 and later use a single Python filter based on mutagen
+       for all audio file types.
+
+     o Pictures: Recoll uses the Exiftool Perl package to extract tag
+       information. Most image file formats are supported. Note that there
+       may not be much interest in indexing the technical tags (image size,
+       aperture, etc.). This is only of interest if you store personal tags
+       or textual descriptions inside the image files.
+
+     o chm: files in microsoft help format need Python and the pychm module
+       (which needs chmlib).
+
+     o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
+       module. icalendar is not needed for newer versions, which use internal
+       code.
+
+     o Zip archives need Python (and the standard zipfile module).
+
+     o Rar archives need Python, the rarfile Python module and the unrar
+       utility.
+
+     o Midi karaoke files need Python and the Midi module
+
+     o Konqueror webarchive format with Python (uses the Tarfile module).
+
+     o mimehtml web archive format (support based on the email filter, which
+       introduces some mild weirdness, but still usable).
+
+   Text, HTML, email folders, and Scribus files are processed internally. Lyx
+   is used to index Lyx files. Many filters need iconv and the standard sed
+   and awk.
+
+     ----------------------------------------------------------------------
+
+   Prev                                        Up                        Next 
+   Chapter 5. Installation and configuration  Home  5.3. Building from source 
+   Link: home: Recoll user manual
+   Link: up: Chapter 5. Installation and configuration
+   Link: prev: 5.2. Supporting packages
+   Link: next: 5.4. Configuration overview
+
+                           5.3. Building from source
+   Prev            Chapter 5. Installation and configuration             Next 
+
+     ----------------------------------------------------------------------
+
+5.3. Building from source
+
+  5.3.1. Prerequisites
+
+   C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
+   itself by strange messages about a missing iconv_open.
+
+   Development files for Xapian core.
+
+  Important
+
+   If you are building Xapian for an older CPU (before Pentium 4 or Athlon
+   64), you need to add the --disable-sse flag to the configure command. Else
+   all Xapian application will crash with an illegal instruction error.
+
+   Development files for Qt .
+
+   Development files for X11 and zlib.
+
+   Check the Recoll download page for up to date version information.
+
+   You will most probably be able to find a binary package for Qt for your
+   system. You may have to compile Xapian but this is not difficult (if you
+   are using FreeBSD, there is a port).
+
+   You may also need libiconv. Recoll currently uses version 1.9 (this should
+   not be critical). On Linux systems, the iconv interface is part of libc
+   and you should not need to do anything special.
+
+  5.3.2. Building
+
+   Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
+   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
+   ok). If you build on another system, and need to modify things, I would
+   very much welcome patches.
+
+   Depending on the Qt 3 configuration on your system, you may have to set
+   the QTDIR and QMAKESPECS variables in your environment:
+
+     o QTDIR should point to the directory above the one that holds the qt
+       include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
+       be /usr/local/qt).
+
+     o QMAKESPECS should be set to the name of one of the Qt mkspecs
+       sub-directories (ie: linux-g++).
+
+   On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
+   is not needed because there is a default link in mkspecs/.
+
+   Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration
+   details are entirely determined by qmake (which is quite often installed
+   as qmake-qt4).
+
+   Configure options: 
+
+     o --without-aspell will disable the code for phonetic matching of search
+       terms.
+
+     o --with-fam or --with-inotify will enable the code for real time
+       indexing. Inotify support is enabled by default on recent Linux
+       systems.
+
+     o --disable-webkit is available from version 1.17 to implement the
+       result list with a Qt QTextBrowser instead of a WebKit widget if you
+       do not or can't depend on the latter.
+
+     o --enable-xattr will enable code to fetch data from file extended
+       attributes. This is only useful is some application stores data in
+       there, and also needs some simple configuration (see comments in the
+       fields configuration file).
+
+     o --enable-camelcase will enable splitting camelCase words. This is not
+       enabled by default as it has the unfortunate side-effect of making
+       some phrase searches quite confusing: ie, "MySQL manual" would be
+       matched by "MySQL manual" and "my sql manual" but not "mysql manual"
+       (only inside phrase searches).
+
+     o --with-file-command Specify the version of the 'file' command to use
+       (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
+       the gnu version on systems where the native one is bad.
+
+     o --disable-qtgui Disable the Qt interface. Will allow building the
+       indexer and the command line search program in absence of a Qt
+       environment.
+
+     o --disable-x11mon Disable X11 connection monitoring inside recollindex.
+       Together with --disable-qtgui, this allows building recoll without Qt
+       and X11.
+
+     o Of course the usual autoconf configure options, like --prefix apply.
+
+   Normal procedure:
+
+         cd recoll-xxx
+         configure
+         make
+         (practices usual hardship-repelling invocations)
+      
+
+   There is little auto-configuration. The configure script will mainly link
+   one of the system-specific files in the mk directory to mk/sysconf. If
+   your system is not known yet, it will tell you as much, and you may want
+   to manually copy and modify one of the existing files (the new file name
+   should be the output of uname -s).
+
+  5.3.3. Installation
+
+   Either type make install or execute recollinstall prefix, in the root of
+   the source tree. This will copy the commands to prefix/bin and the sample
+   configuration files, scripts and other shared data to prefix/share/recoll.
+
+   If the installation prefix given to recollinstall is different from either
+   the system default or the value which was specified when executing
+   configure (as in configure --prefix /some/path), you will have to set the
+   RECOLL_DATADIR environment variable to indicate where the shared data is
+   to be found (ie for (ba)sh: export
+   RECOLL_DATADIR=/some/path/share/recoll).
+
+   You can then proceed to configuration.
+
+     ----------------------------------------------------------------------
+
+   Prev                                Up                                Next 
+   5.2. Supporting packages           Home        5.4. Configuration overview 
+   Link: home: Recoll user manual
+   Link: up: Chapter 5. Installation and configuration
+   Link: prev: 5.3. Building from source
+
+                          5.4. Configuration overview
+   Prev            Chapter 5. Installation and configuration                  
+
+     ----------------------------------------------------------------------
+
+5.4. Configuration overview
+
+   Most of the parameters specific to the recoll GUI are set through the
+   Preferences menu and stored in the standard Qt place
+   ($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
+   this by hand.
+
+   Recoll indexing options are set inside text configuration files located in
+   a configuration directory. There can be several such directories, each of
+   which define the parameters for one index.
+
+   The configuration files can be edited by hand or through the Index
+   configuration dialog (Preferences menu). The GUI tool will try to respect
+   your formatting and comments as much as possible, so it is quite possible
+   to use both ways.
+
+   The most accurate documentation for the configuration parameters is given
+   by comments inside the default files, and we will just give a general
+   overview here.
+
+   For each index, there are two sets of configuration files. System-wide
+   configuration files are kept in a directory named like
+   /usr/[local/]share/recoll/examples, and define default values, shared by
+   all indexes. For each index, a parallel set of files defines the
+   customized parameters.
+
+   The default location of the configuration is the .recoll directory in your
+   home. Most people will only use this directory.
+
+   This location can be changed, or others can be added with the
+   RECOLL_CONFDIR environment variable or the -c option parameter to recoll
+   and recollindex.
+
+   If the .recoll directory does not exist when recoll or recollindex are
+   started, it will be created with a set of empty configuration files.
+   recoll will give you a chance to edit the configuration file before
+   starting indexing. recollindex will proceed immediately. To avoid
+   mistakes, the automatic directory creation will only occur for the default
+   location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
+   will have to create the directory).
+
+   All configuration files share the same format. For example, a short
+   extract of the main configuration file might look as follows:
+
+         # Space-separated list of directories to index.
+         topdirs =  ~/docs /usr/share/doc
+
+         [~/somedirectory-with-utf8-txt-files]
+         defaultcharset = utf-8
+        
+
+   There are three kinds of lines:
+
+     o Comment (starts with #) or empty.
+
+     o Parameter affectation (name = value).
+
+     o Section definition ([somedirname]).
+
+   Depending on the type of configuration file, section definitions either
+   separate groups of parameters or allow redefining some parameters for a
+   directory sub-tree. They stay in effect until another section definition,
+   or the end of file, is encountered. Some of the parameters used for
+   indexing are looked up hierarchically from the current directory location
+   upwards. Not all parameters can be meaningfully redefined, this is
+   specified for each in the next section.
+
+   When found at the beginning of a file path, the tilde character (~) is
+   expanded to the name of the user's home directory, as a shell would do.
+
+   White space is used for separation inside lists. List elements with
+   embedded spaces can be quoted using double-quotes.
+
+   Encoding issues. Most of the configuration parameters are plain ASCII. Two
+   particular sets of values may cause encoding issues:
+
+     o File path parameters may contain non-ascii characters and should use
+       the exact same byte values as found in the file system directory.
+       Usually, this means that the configuration file should use the system
+       default locale encoding.
+
+     o The unac_except_trans parameter should be encoded in UTF-8. If your
+       system locale is not UTF-8, and you need to also specify non-ascii
+       file paths, this poses a difficulty because common text editors cannot
+       handle multiple encodings in a single file. In this relatively
+       unlikely case, you can edit the configuration file as two separate
+       text files with appropriate encodings, and concatenate them to create
+       the complete configuration.
+
+  5.4.1. Main configuration file
+
+   recoll.conf is the main configuration file. It defines things like what to
+   index (top directories and things to ignore), and the default character
+   set to use for document types which do not specify it internally.
+
+   The default configuration will index your home directory. If this is not
+   appropriate, start recoll to create a blank configuration, click Cancel,
+   and edit the configuration file before restarting the command. This will
+   start the initial indexing, which may take some time.
+
+   Most of the following parameters can be changed from the Index
+   Configuration menu in the recoll interface. Some can only be set by
+   editing the configuration file.
+
+    5.4.1.1. Parameters affecting what documents we index:
+
+   topdirs
+
+           Specifies the list of directories or files to index (recursively
+           for directories). You can use symbolic links as elements of this
+           list. See the followLinks option about following symbolic links
+           found under the top elements (not followed by default).
+
+   skippedNames
+
+           A space-separated list of patterns for names of files or
+           directories that should be completely ignored. The list defined in
+           the default file is:
+
+ skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
+                *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
+                .recoll* xapiandb recollrc recoll.conf
+
+           The list can be redefined at any sub-directory in the indexed
+           area.
+
+           The top-level directories are not affected by this list (that is,
+           a directory in topdirs might match and would still be indexed).
+
+           The list in the default configuration does not exclude hidden
+           directories (names beginning with a dot), which means that it may
+           index quite a few things that you do not want. On the other hand,
+           email user agents like thunderbird usually store messages in
+           hidden directories, and you probably want this indexed. One
+           possible solution is to have .* in skippedNames, and add things
+           like ~/.thunderbird or ~/.evolution in topdirs.
+
+           Not even the file names are indexed for patterns in this list. See
+           the recoll_noindex variable in mimemap for an alternative approach
+           which indexes the file names.
+
+   skippedPaths and daemSkippedPaths
+
+           A space-separated list of patterns for paths of files or
+           directories that should be skipped. There is no default in the
+           sample configuration file, but the code always adds the
+           configuration and database directories in there.
+
+           skippedPaths is used both by batch and real time indexing.
+           daemSkippedPaths can be used to specify things that should be
+           indexed at startup, but not monitored.
+
+           Example of use for skipping text files only in a specific
+           directory:
+
+ skippedPaths = ~/somedir/*.txt
+              
+
+   skippedPathsFnmPathname
+
+           The values in the *skippedPaths variables are matched by default
+           with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags.
+           This means that '/' characters must be matched explicitely. You
+           can set skippedPathsFnmPathname to 0 to disable the use of
+           FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
+
+   followLinks
+
+           Specifies if the indexer should follow symbolic links while
+           walking the file tree. The default is to ignore symbolic links to
+           avoid multiple indexing of linked files. No effort is made to
+           avoid duplication when this option is set to true. This option can
+           be set individually for each of the topdirs members by using
+           sections. It can not be changed below the topdirs level.
+
+   indexedmimetypes
+
+           Recoll normally indexes any file which it knows how to read. This
+           list lets you restrict the indexed mime types to what you specify.
+           If the variable is unspecified or the list empty (the default),
+           all supported types are processed.
+
+   compressedfilemaxkbs
+
+           Size limit for compressed (.gz or .bz2) files. These need to be
+           decompressed in a temporary directory for identification, which
+           can be very wasteful if 'uninteresting' big compressed files are
+           present. Negative means no limit, 0 means no processing of any
+           compressed file. Defaults to -1.
+
+   textfilemaxmbs
+
+           Maximum size for text files. Very big text files are often
+           uninteresting logs. Set to -1 to disable (default 20MB).
+
+   textfilepagekbs
+
+           If set to other than -1, text files will be indexed as multiple
+           documents of the given page size. This may be useful if you do
+           want to index very big text files as it will both reduce memory
+           usage at index time and help with loading data to the preview
+           window. A size of a few megabytes would seem reasonable (default:
+           1MB).
+
+   membermaxkbs
+
+           This defines the maximum size in kilobytes for an archive member
+           (zip, tar or rar at the moment). Bigger entries will be skipped.
+
+   indexallfilenames
+
+           Recoll indexes file names in a special section of the database to
+           allow specific file names searches using wild cards. This
+           parameter decides if file name indexing is performed only for
+           files with mime types that would qualify them for full text
+           indexing, or for all files inside the selected subtrees,
+           independently of mime type.
+
+   usesystemfilecommand
+
+           Decide if we use the file -i system command as a final step for
+           determining the mime type for a file (the main procedure uses
+           suffix associations as defined in the mimemap file). This can be
+           useful for files with suffix-less names, but it will also cause
+           the indexing of many bogus "text" files.
+
+   processwebqueue
+
+           If this is set, process the directory where Web browser plugins
+           copy visited pages for indexing.
+
+   webqueuedir
+
+           The path to the web indexing queue. This is hard-coded in the
+           Firefox plugin as ~/.recollweb/ToIndex so there should be no need
+           to change it.
+
+    5.4.1.2. Parameters affecting how we generate terms:
+
+   Changing some of these parameters will imply a full reindex. Also, when
+   using multiple indexes, it may not make sense to search indexes that don't
+   share the values for these parameters, because they usually affect both
+   search and index operations.
+
+   indexStripChars
+
+           Decide if we strip characters of diacritics and convert them to
+           lower-case before terms are indexed. If we don't, searches
+           sensitive to case and diacritics can be performed, but the index
+           will be bigger, and some marginal weirdness may sometimes occur.
+           The default is a stripped index (indexStripChars = 1) for now.
+           When using multiple indexes for a search, this parameter must be
+           defined identically for all. Changing the value implies an index
+           reset.
+
+   maxTermExpand
+
+           Maximum expansion count for a single term (e.g.: when using
+           wildcards). The default of 10000 is reasonable and will avoid
+           queries that appear frozen while the engine is walking the term
+           list.
+
+   maxXapianClauses
+
+           Maximum number of elementary clauses we can add to a single Xapian
+           query. In some cases, the result of term expansion can be
+           multiplicative, and we want to avoid using excessive memory. The
+           default of 100 000 should be both high enough in most cases and
+           compatible with current typical hardware configurations.
+
+   nonumbers
+
+           If this set to true, no terms will be generated for numbers. For
+           example "123", "1.5e6", 192.168.1.4, would not be indexed
+           ("value123" would still be). Numbers are often quite interesting
+           to search for, and this should probably not be set except for
+           special situations, ie, scientific documents with huge amounts of
+           numbers in them. This can only be set for a whole index, not for a
+           subtree.
+
+   nocjk
+
+           If this set to true, specific east asian (Chinese Korean Japanese)
+           characters/word splitting is turned off. This will save a small
+           amount of cpu if you have no CJK documents. If your document base
+           does include such text but you are not interested in searching it,
+           setting nocjk may be a significant time and space saver.
+
+   cjkngramlen
+
+           This lets you adjust the size of n-grams used for indexing CJK
+           text. The default value of 2 is probably appropriate in most
+           cases. A value of 3 would allow more precision and efficiency on
+           longer words, but the index will be approximately twice as large.
+
+   indexstemminglanguages
+
+           A list of languages for which the stem expansion databases will be
+           built. See recollindex(1) or use the recollindex -l command for
+           possible values. You can add a stem expansion database for a
+           different language by using recollindex -s, but it will be deleted
+           during the next indexing. Only languages listed in the
+           configuration file are permanent.
+
+   defaultcharset
+
+           The name of the character set used for files that do not contain a
+           character set definition (ie: plain text files). This can be
+           redefined for any sub-directory. If it is not set at all, the
+           character set used is the one defined by the nls environment (
+           LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
+
+   unac_except_trans
+
+           This is a list of characters, encoded in UTF-8, which should be
+           handled specially when converting text to unaccented lowercase.
+           For example, in Swedish, the letter a with diaeresis has full
+           alphabet citizenship and should not be turned into an a. Each
+           element in the space-separated list has the special character as
+           first element and the translation following. The handling of both
+           the lowercase and upper-case versions of a character should be
+           specified, as appartenance to the list will turn-off both standard
+           accent and case processing. Example for Swedish:
+
+ unac_except_trans =  aaaa AAaa a:a: A:a: o:o: O:o:
+            
+
+           Note that the translation is not limited to a single character,
+           you could very well have something like u:ue in the list.
+
+           The default value set for unac_except_trans can't be listed here
+           because I have trouble with SGML and UTF-8, but it only contains
+           ligature decompositions: german ss, oe, ae, fi, fl.
+
+           This parameter can't be defined for subdirectories, it is global,
+           because there is no way to do otherwise when querying. If you have
+           document sets which would need different values, you will have to
+           index and query them separately.
+
+   maildefcharset
+
+           This can be used to define the default character set specifically
+           for email messages which don't specify it. This is mainly useful
+           for readpst (libpst) dumps, which are utf-8 but do not say so.
+
+   localfields
+
+           This allows setting fields for all documents under a given
+           directory. Typical usage would be to set an "rclaptg" field, to be
+           used in mimeview to select a specific viewer. If several fields
+           are to be set, they should be separated with a colon (':')
+           character (which there is currently no way to escape). Ie:
+           localfields= rclaptg=gnus:other = val, then select specifier
+           viewer with mimetype|tag=... in mimeview.
+
+    5.4.1.3. Parameters affecting where and how we store things:
+
+   dbdir
+
+           The name of the Xapian data directory. It will be created if
+           needed when the index is initialized. If this is not an absolute
+           path, it will be interpreted relative to the configuration
+           directory. The value can have embedded spaces but starting or
+           trailing spaces will be trimmed. You cannot use quotes here.
+
+   idxstatusfile
+
+           The name of the scratch file where the indexer process updates its
+           status. Default: idxstatus.txt inside the configuration directory.
+
+   maxfsoccuppc
+
+           Maximum file system occupation before we stop indexing. The value
+           is a percentage, corresponding to what the "Capacity" df output
+           column shows. The default value is 0, meaning no checking.
+
+   mboxcachedir
+
+           The directory where mbox message offsets cache files are held.
+           This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
+           to share a directory between different configurations.
+
+   mboxcacheminmbs
+
+           The minimum mbox file size over which we cache the offsets. There
+           is really no sense in caching offsets for small files. The default
+           is 5 MB.
+
+   webcachedir
+
+           This is only used by the web browser plugin indexing code, and
+           defines where the cache for visited pages will live. Default:
+           $RECOLL_CONFDIR/webcache
+
+   webcachemaxmbs
+
+           This is only used by the web browser plugin indexing code, and
+           defines the maximum size for the web page cache. Default: 40 MB.
+
+   idxflushmb
+
+           Threshold (megabytes of new text data) where we flush from memory
+           to disk index. Setting this can help control memory usage. A value
+           of 0 means no explicit flushing, letting Xapian use its own
+           default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
+           documents, which gives little memory usage control, as memory
+           usage also depends on average document size. The default value is
+           10, and it is probably a bit low. If your system usually has free
+           memory, you can try higher values between 20 and 80. In my
+           experience, values beyond 100 are always counterproductive.
+
+    5.4.1.4. Miscellaneous parameters:
+
+   autodiacsens
+
+           IF the index is not stripped, decide if we automatically trigger
+           diacritics sensitivity if the search term has accented characters
+           (not in unac_except_trans). Else you need to use the query
+           language and the D modifier to specify diacritics sensitivity.
+           Default is no.
+
+   autocasesens
+
+           IF the index is not stripped, decide if we automatically trigger
+           character case sensitivity if the search term has upper-case
+           characters in any but the first position. Else you need to use the
+           query language and the C modifier to specify character-case
+           sensitivity. Default is yes.
+
+   loglevel,daemloglevel
+
+           Verbosity level for recoll and recollindex. A value of 4 lists
+           quite a lot of debug/information messages. 2 only lists errors.
+           The daemversion is specific to the indexing monitor daemon.
+
+   logfilename, daemlogfilename
+
+           Where the messages should go. 'stderr' can be used as a special
+           value, and is the default. The daemversion is specific to the
+           indexing monitor daemon.
+
+   mondelaypatterns
+
+           This allows specify wildcard path patterns (processed with
+           fnmatch(3) with 0 flag), to match files which change too often and
+           for which a delay should be observed before re-indexing. This is a
+           space-separated list, each entry being a pattern and a time in
+           seconds, separated by a colon. You can use double quotes if a path
+           entry contains white space. Example:
+
+ mondelaypatterns = *.log:20 "this one has spaces*:10"
+              
+
+   monixinterval
+
+           Minimum interval (seconds) for processing the indexing queue. The
+           real time monitor does not process each event when it comes in,
+           but will wait this time for the queue to accumulate to diminish
+           overhead and in order to aggregate multiple events to the same
+           file. Default 30 S.
+
+   monauxinterval
+
+           Period (in seconds) at which the real time monitor will regenerate
+           the auxiliary databases (spelling, stemming) if needed. The
+           default is one hour.
+
+   monioniceclass, monioniceclassdata
+
+           These allow defining the ionice class and data used by the indexer
+           (default class 3, no data).
+
+   filtermaxseconds
+
+           Maximum filter execution time, after which it is aborted. Some
+           postscript programs just loop...
+
+   filtersdir
+
+           A directory to search for the external filter scripts used to
+           index some types of files. The value should not be changed, except
+           if you want to modify one of the default scripts. The value can be
+           redefined for any sub-directory.
+
+   iconsdir
+
+           The name of the directory where recoll result list icons are
+           stored. You can change this if you want different images.
+
+   idxabsmlen
+
+           Recoll stores an abstract for each indexed file inside the
+           database. The text can come from an actual 'abstract' section in
+           the document or will just be the beginning of the document. It is
+           stored in the index so that it can be displayed inside the result
+           lists without decoding the original file. The idxabsmlen parameter
+           defines the size of the stored abstract. The default value is 250
+           bytes. The search interface gives you the choice to display this
+           stored text or a synthetic abstract built by extracting text
+           around the search terms. If you always prefer the synthetic
+           abstract, you can reduce this value and save a little space.
+
+   aspellLanguage
+
+           Language definitions to use when creating the aspell dictionary.
+           The value must match a set of aspell language definition files.
+           You can type "aspell config" to see where these are installed
+           (look for data-dir). The default if the variable is not set is to
+           use your desktop national language environment to guess the value.
+
+   noaspell
+
+           If this is set, the aspell dictionary generation is turned off.
+           Useful for cases where you don't need the functionality or when it
+           is unusable because aspell crashes during dictionary generation.
+
+   mhmboxquirks
+
+           This allows definining location-related quirks for the mailbox
+           handler. Currently only the tbird flag is defined, and it should
+           be set for directories which hold Thunderbird data, as their
+           folder format is weird.
+
+  5.4.2. The fields file
+
+   This file contains information about dynamic fields handling in Recoll.
+   Some very basic fields have hard-wired behaviour, and, mostly, you should
+   not change the original data inside the fields file. But you can create
+   custom fields fitting your data and handle them just like they were native
+   ones.
+
+   The fields file has several sections, which each define an aspect of
+   fields processing. Quite often, you'll have to modify several sections to
+   obtain the desired behaviour.
+
+   We will only give a short description here, you should refer to the
+   comments inside the file for more detailed information.
+
+   Field names should be lowercase alphabetic ASCII.
+
+   [prefixes]
+
+           A field becomes indexed (searchable) by having a prefix defined in
+           this section.
+
+   [stored]
+
+           A field becomes stored (displayable inside results) by having its
+           name listed in this section (typically with an empty value).
+
+   [aliases]
+
+           This section defines lists of synonyms for the canonical names
+           used inside the [prefixes] and [stored] sections
+
+   filter-specific sections
+
+           Some filters may need specific configuration for handling fields.
+           Only the email message filter currently has such a section (named
+           [mail]). It allows indexing arbitrary email headers in addition to
+           the ones indexed by default. Other such sections may appear in the
+           future.
+
+   Here follows a small example of a personal fields file. This would extract
+   a specific email header and use it as a searchable field, with data
+   displayable inside result lists. (Side note: as the email filter does no
+   decoding on the values, only plain ascii headers can be indexed, and only
+   the first occurrence will be used for headers that occur several times).
+
+ [prefixes]
+ # Index mailmytag contents (with the given prefix)
+ mailmytag = XMTAG
+
+ [stored]
+ # Store mailmytag inside the document data record (so that it can be
+ # displayed - as %(mailmytag) - in result lists).
+ mailmytag =
+
+ [mail]
+ # Extract the X-My-Tag mail header, and use it internally with the
+ # mailmytag field name
+ x-my-tag = mailmytag
+
+  5.4.3. The mimemap file
+
+   mimemap specifies the file name extension to mime type mappings.
+
+   For file names without an extension, or with an unknown one, the system's
+   file -i command will be executed to determine the mime type (this can be
+   switched off inside the main configuration file).
+
+   The mappings can be specified on a per-subtree basis, which may be useful
+   in some cases. Example: gaim logs have a .txt extension but should be
+   handled specially, which is possible because they are usually all located
+   in one place.
+
+   mimemap also has a recoll_noindex variable which is a list of suffixes.
+   Matching files will be skipped (which avoids unnecessary decompressions or
+   file executions). This is partially redundant with skippedNames in the
+   main configuration file, with a few differences: it will not affect
+   directories, it cannot be made dependant on the file-system location (it
+   is a configuration-wide parameter), and the file names will still be
+   indexed (not even the file names are indexed for patterns in skippedNames.
+   recoll_noindex is used mostly for things known to be unindexable by a
+   given Recoll version. Having it there avoids cluttering the more
+   user-oriented and locally customized skippedNames.
+
+  5.4.4. The mimeconf file
+
+   mimeconf specifies how the different mime types are handled for indexing,
+   and which icons are displayed in the recoll result lists.
+
+   Changing the parameters in the [index] section is probably not a good idea
+   except if you are a Recoll developer.
+
+   The [icons] section allows you to change the icons which are displayed by
+   recoll in the result lists (the values are the basenames of the png images
+   inside the iconsdir directory (specified in recoll.conf).
+
+  5.4.5. The mimeview file
+
+   mimeview specifies which programs are started when you click on an Open
+   link in a result list. Ie: HTML is normally displayed using firefox, but
+   you may prefer Konqueror, your openoffice.org program might be named
+   oofice instead of openoffice etc.
+
+   Changes to this file can be done by direct editing, or through the recoll
+   GUI preferences dialog.
+
+   If Use desktop preferences to choose document editor is checked in the
+   Recoll GUI preferences, all mimeview entries will be ignored except the
+   one labelled application/x-all (which is set to use xdg-open by default).
+
+   In this case, the xallexcepts top level variable defines a list of mime
+   type exceptions which will be processed according to the local entries
+   instead of being passed to the desktop. This is so that specific Recoll
+   options such as a page number or a search string can be passed to
+   applications that support them, such as the evince viewer.
+
+   As for the other configuration files, the normal usage is to have a
+   mimeview inside your own configuration directory, with just the
+   non-default entries, which will override those from the central
+   configuration file.
+
+   All viewer definition entries must be placed under a [view] section.
+
+   The keys in the file are normally mime types. You can add an application
+   tag to specialize the choice for an area of the filesystem (using a
+   localfields specification in mimeconf). The syntax for the key is
+   mimetype|tag
+
+   The nouncompforviewmts entry, (placed at the top level, outside of the
+   [view] section), holds a list of mime types that should not be
+   uncompressed before starting the viewer (if they are found compressed, ie:
+   mydoc.doc.gz).
+
+   The right side of each assignment holds a command to be executed for
+   opening the file. The following substitutions are performed:
+
+     o %D. Document date
+
+     o %f. File name. This may be the name of a temporary file if it was
+       necessary to create one (ie: to extract a subdocument from a
+       container).
+
+     o %F. Original file name. Same as %f except if a temporary file is used.
+
+     o %i. Internal path, for subdocuments of containers. The format depends
+       on the container type. If this appears in the command line, Recoll
+       will not create a temporary file to extract the subdocument, expecting
+       the called application (possibly a script) to be able to handle it.
+
+     o %M. Mime type
+
+     o %p. Page index. Only significant for a subset of document types,
+       currently only PDF, Postscript and DVI files. Can be used to start the
+       editor at the right page for a match or snippet.
+
+     o %s. Search term. The value will only be set for documents with indexed
+       page numbers (ie: PDF). The value will be one of the matched search
+       terms. It would allow pre-setting the value in the "Find" entry inside
+       Evince for example, for easy highlighting of the term.
+
+     o %U, %u. Url.
+
+   In addition to the predefined values above, all strings like %(fieldname)
+   will be replaced by the value of the field named fieldname for the
+   document. This could be used in combination with field customisation to
+   help with opening the document.
+
+  5.4.6. Examples of configuration adjustments
+
+    5.4.6.1. Adding an external viewer for an non-indexed type
+
+   Imagine that you have some kind of file which does not have indexable
+   content, but for which you would like to have a functional Open link in
+   the result list (when found by file name). The file names end in .blob and
+   can be displayed by application blobviewer.
+
+   You need two entries in the configuration files for this to work:
+
+     o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
+       following line:
+
+ .blob = application/x-blobapp
+
+       Note that the mime type is made up here, and you could call it
+       diesel/oil just the same.
+
+     o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
+
+ application/x-blobapp = blobviewer %f
+
+       We are supposing that blobviewer wants a file name parameter here, you
+       would use %u if it liked URLs better.
+
+   If you just wanted to change the application used by Recoll to display a
+   mime type which it already knows, you would just need to edit mimeview.
+   The entries you add in your personal file override those in the central
+   configuration, which you do not need to alter. mimeview can also be
+   modified from the Gui.
+
+    5.4.6.2. Adding indexing support for a new file type
+
+   Let us now imagine that the above .blob files actually contain indexable
+   text and that you know how to extract it with a command line program.
+   Getting Recoll to index the files is easy. You need to perform the above
+   alteration, and also to add data to the mimeconf file (typically in
+   ~/.recoll/mimeconf):
+
+     o Under the [index] section, add the following line (more about the
+       rclblob indexing script later):
+
+ application/x-blobapp = exec rclblob
+
+     o Under the [icons] section, you should choose an icon to be displayed
+       for the files inside the result lists. Icons are normally 64x64 pixels
+       PNG files which live in /usr/[local/]share/recoll/images.
+
+     o Under the [categories] section, you should add the mime type where it
+       makes sense (you can also create a category). Categories may be used
+       for filtering in advanced search.
+
+   The rclblob filter should be an executable program or script which exists
+   inside /usr/[local/]share/recoll/filters. It will be given a file name as
+   argument and should output the text or html contents on the standard
+   output.
+
+   The filter programming section describes in more detail how to write a
+   filter.
+
+     ----------------------------------------------------------------------
+
+   Prev                                Up                                     
+   5.3. Building from source          Home