--- a/src/INSTALL
+++ b/src/INSTALL
@@ -2,3 +2,1070 @@
More documentation can be found in the doc/ directory or at http://www.recoll.org
+ Link: home: Recoll user manual
+ Link: up: Recoll user manual
+ Link: prev: 4.3. API
+ Link: next: 5.2. Supporting packages
+
+ Chapter 5. Installation and configuration
+ Prev Next
+
+ ----------------------------------------------------------------------
+
+Chapter 5. Installation and configuration
+
+5.1. Installing a binary copy
+
+ There are three types of binary Recoll installations:
+
+ o Through your system normal software distribution framework (ie,
+ Debian/Ubuntu apt, FreeBSD ports, etc.).
+
+ o From a package downloaded from the Recoll web site.
+
+ o From a prebuilt tree downloaded from the Recoll web site.
+
+ In all cases, the strict software dependancies (ie on Xapian or iconv)
+ will be automatically satisfied, you should not have to worry about them.
+
+ You will only have to check or install supporting applications for the
+ file types that you want to index beyond those that are natively processed
+ by Recoll (text, HTML, email files, and a few others).
+
+ You should also maybe have a look at the configuration section (but this
+ may not be necessary for a quick test with default parameters). Most
+ parameters can be more conveniently set from the GUI interface.
+
+ 5.1.1. Installing through a package system
+
+ If you use a BSD-type port system or a prebuilt package (DEB, RPM,
+ manually or through the system software configuration utility), just
+ follow the usual procedure for your system.
+
+ 5.1.2. Installing a prebuilt Recoll
+
+ The unpackaged binary versions on the Recoll web site are just compressed
+ tar files of a build tree, where only the useful parts were kept
+ (executables and sample configuration).
+
+ The executable binary files are built with a static link to libxapian and
+ libiconv, to make installation easier (no dependencies).
+
+ After extracting the tar file, you can proceed with installation as if you
+ had built the package from source (that is, just type make install). The
+ binary trees are built for installation to /usr/local.
+
+ ----------------------------------------------------------------------
+
+ Prev Next
+ 4.3. API Home 5.2. Supporting packages
+ Link: home: Recoll user manual
+ Link: up: Chapter 5. Installation and configuration
+ Link: prev: Chapter 5. Installation and configuration
+ Link: next: 5.3. Building from source
+
+ 5.2. Supporting packages
+ Prev Chapter 5. Installation and configuration Next
+
+ ----------------------------------------------------------------------
+
+5.2. Supporting packages
+
+ Recoll uses external applications to index some file types. You need to
+ install them for the file types that you wish to have indexed (these are
+ run-time optional dependencies. None is needed for building or running
+ Recoll except for indexing their specific file type).
+
+ After an indexing pass, the commands that were found missing can be
+ displayed from the recoll File menu. The list is stored in the missing
+ text file inside the configuration directory.
+
+ A list of common file types which need external commands follows. Many of
+ the filters need the iconv command, which is not always listed as a
+ dependancy.
+
+ Please note that, due to the relatively dynamic nature of this
+ information, the most up to date version is now kept on the Recoll helper
+ applications page along with links to the home pages or best
+ source/patches pages, and misc tips. The list below is not updated often
+ and may be quite stale.
+
+ For many Linux distributions, most of the commands listed can be installed
+ from the package repositories. However, the packages are sometimes
+ outdated, or not the best version for Recoll, so you should take a look at
+ the Recoll helper applications page if a file type is important to you.
+
+ As of Recoll release 1.14, a number of XML-based formats that were handled
+ by ad hoc filter code now use the xsltproc command, which usually comes
+ with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
+
+ Now for the list:
+
+ o Openoffice files need unzip and xsltproc.
+
+ o PDF files need pdftotext which is part of the Xpdf or Poppler
+ packages.
+
+ o Postscript files need pstotext. The original version has an issue with
+ shell character in file names, which is corrected in recent packages.
+ See the the Recoll helper applications page for more detail.
+
+ o MS Word needs antiword. It is also useful to have wvWare installed as
+ it may be be used as a fallback for some files which antiword does not
+ handle.
+
+ o MS Excel and PowerPoint need catdoc.
+
+ o MS Open XML (docx) needs xsltproc.
+
+ o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
+ Ubuntu) package.
+
+ o RTF files need unrtf, which, in its standard version, has much trouble
+ with non-western character sets. Check the Recoll helper applications
+ page.
+
+ o TeX files need untex or detex. Check the Recoll helper applications
+ page for sources if it's not packaged for your distribution.
+
+ o dvi files need dvips.
+
+ o djvu files need djvutxt and djvused from the DjVuLibre package.
+
+ o Audio files: Recoll releases before 1.13 used the id3info command from
+ the id3lib package to extract mp3 tag information, metaflac (standard
+ flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
+ Releases 1.14 and later use a single Python filter based on mutagen
+ for all audio file types.
+
+ o Pictures: Recoll uses the Exiftool Perl package to extract tag
+ information. Most image file formats are supported. Note that there
+ may not be much interest in indexing the technical tags (image size,
+ aperture, etc.). This is only of interest if you store personal tags
+ or textual descriptions inside the image files.
+
+ o chm: files in microsoft help format need Python and the pychm module
+ (which needs chmlib).
+
+ o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
+ module. icalendar is not needed for newer versions, which use internal
+ code.
+
+ o Zip archives need Python (and the standard zipfile module).
+
+ o Rar archives need Python, the rarfile Python module and the unrar
+ utility.
+
+ o Midi karaoke files need Python and the Midi module
+
+ o Konqueror webarchive format with Python (uses the Tarfile module).
+
+ o mimehtml web archive format (support based on the email filter, which
+ introduces some mild weirdness, but still usable).
+
+ Text, HTML, email folders, and Scribus files are processed internally. Lyx
+ is used to index Lyx files. Many filters need iconv and the standard sed
+ and awk.
+
+ ----------------------------------------------------------------------
+
+ Prev Up Next
+ Chapter 5. Installation and configuration Home 5.3. Building from source
+ Link: home: Recoll user manual
+ Link: up: Chapter 5. Installation and configuration
+ Link: prev: 5.2. Supporting packages
+ Link: next: 5.4. Configuration overview
+
+ 5.3. Building from source
+ Prev Chapter 5. Installation and configuration Next
+
+ ----------------------------------------------------------------------
+
+5.3. Building from source
+
+ 5.3.1. Prerequisites
+
+ C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
+ itself by strange messages about a missing iconv_open.
+
+ Development files for Xapian core.
+
+ Important
+
+ If you are building Xapian for an older CPU (before Pentium 4 or Athlon
+ 64), you need to add the --disable-sse flag to the configure command. Else
+ all Xapian application will crash with an illegal instruction error.
+
+ Development files for Qt .
+
+ Development files for X11 and zlib.
+
+ Check the Recoll download page for up to date version information.
+
+ You will most probably be able to find a binary package for Qt for your
+ system. You may have to compile Xapian but this is not difficult (if you
+ are using FreeBSD, there is a port).
+
+ You may also need libiconv. Recoll currently uses version 1.9 (this should
+ not be critical). On Linux systems, the iconv interface is part of libc
+ and you should not need to do anything special.
+
+ 5.3.2. Building
+
+ Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
+ versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
+ ok). If you build on another system, and need to modify things, I would
+ very much welcome patches.
+
+ Depending on the Qt 3 configuration on your system, you may have to set
+ the QTDIR and QMAKESPECS variables in your environment:
+
+ o QTDIR should point to the directory above the one that holds the qt
+ include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
+ be /usr/local/qt).
+
+ o QMAKESPECS should be set to the name of one of the Qt mkspecs
+ sub-directories (ie: linux-g++).
+
+ On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
+ is not needed because there is a default link in mkspecs/.
+
+ Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration
+ details are entirely determined by qmake (which is quite often installed
+ as qmake-qt4).
+
+ Configure options:
+
+ o --without-aspell will disable the code for phonetic matching of search
+ terms.
+
+ o --with-fam or --with-inotify will enable the code for real time
+ indexing. Inotify support is enabled by default on recent Linux
+ systems.
+
+ o --disable-webkit is available from version 1.17 to implement the
+ result list with a Qt QTextBrowser instead of a WebKit widget if you
+ do not or can't depend on the latter.
+
+ o --enable-xattr will enable code to fetch data from file extended
+ attributes. This is only useful is some application stores data in
+ there, and also needs some simple configuration (see comments in the
+ fields configuration file).
+
+ o --enable-camelcase will enable splitting camelCase words. This is not
+ enabled by default as it has the unfortunate side-effect of making
+ some phrase searches quite confusing: ie, "MySQL manual" would be
+ matched by "MySQL manual" and "my sql manual" but not "mysql manual"
+ (only inside phrase searches).
+
+ o --with-file-command Specify the version of the 'file' command to use
+ (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
+ the gnu version on systems where the native one is bad.
+
+ o --disable-qtgui Disable the Qt interface. Will allow building the
+ indexer and the command line search program in absence of a Qt
+ environment.
+
+ o --disable-x11mon Disable X11 connection monitoring inside recollindex.
+ Together with --disable-qtgui, this allows building recoll without Qt
+ and X11.
+
+ o Of course the usual autoconf configure options, like --prefix apply.
+
+ Normal procedure:
+
+ cd recoll-xxx
+ configure
+ make
+ (practices usual hardship-repelling invocations)
+
+
+ There is little auto-configuration. The configure script will mainly link
+ one of the system-specific files in the mk directory to mk/sysconf. If
+ your system is not known yet, it will tell you as much, and you may want
+ to manually copy and modify one of the existing files (the new file name
+ should be the output of uname -s).
+
+ 5.3.3. Installation
+
+ Either type make install or execute recollinstall prefix, in the root of
+ the source tree. This will copy the commands to prefix/bin and the sample
+ configuration files, scripts and other shared data to prefix/share/recoll.
+
+ If the installation prefix given to recollinstall is different from either
+ the system default or the value which was specified when executing
+ configure (as in configure --prefix /some/path), you will have to set the
+ RECOLL_DATADIR environment variable to indicate where the shared data is
+ to be found (ie for (ba)sh: export
+ RECOLL_DATADIR=/some/path/share/recoll).
+
+ You can then proceed to configuration.
+
+ ----------------------------------------------------------------------
+
+ Prev Up Next
+ 5.2. Supporting packages Home 5.4. Configuration overview
+ Link: home: Recoll user manual
+ Link: up: Chapter 5. Installation and configuration
+ Link: prev: 5.3. Building from source
+
+ 5.4. Configuration overview
+ Prev Chapter 5. Installation and configuration
+
+ ----------------------------------------------------------------------
+
+5.4. Configuration overview
+
+ Most of the parameters specific to the recoll GUI are set through the
+ Preferences menu and stored in the standard Qt place
+ ($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
+ this by hand.
+
+ Recoll indexing options are set inside text configuration files located in
+ a configuration directory. There can be several such directories, each of
+ which define the parameters for one index.
+
+ The configuration files can be edited by hand or through the Index
+ configuration dialog (Preferences menu). The GUI tool will try to respect
+ your formatting and comments as much as possible, so it is quite possible
+ to use both ways.
+
+ The most accurate documentation for the configuration parameters is given
+ by comments inside the default files, and we will just give a general
+ overview here.
+
+ For each index, there are two sets of configuration files. System-wide
+ configuration files are kept in a directory named like
+ /usr/[local/]share/recoll/examples, and define default values, shared by
+ all indexes. For each index, a parallel set of files defines the
+ customized parameters.
+
+ The default location of the configuration is the .recoll directory in your
+ home. Most people will only use this directory.
+
+ This location can be changed, or others can be added with the
+ RECOLL_CONFDIR environment variable or the -c option parameter to recoll
+ and recollindex.
+
+ If the .recoll directory does not exist when recoll or recollindex are
+ started, it will be created with a set of empty configuration files.
+ recoll will give you a chance to edit the configuration file before
+ starting indexing. recollindex will proceed immediately. To avoid
+ mistakes, the automatic directory creation will only occur for the default
+ location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
+ will have to create the directory).
+
+ All configuration files share the same format. For example, a short
+ extract of the main configuration file might look as follows:
+
+ # Space-separated list of directories to index.
+ topdirs = ~/docs /usr/share/doc
+
+ [~/somedirectory-with-utf8-txt-files]
+ defaultcharset = utf-8
+
+
+ There are three kinds of lines:
+
+ o Comment (starts with #) or empty.
+
+ o Parameter affectation (name = value).
+
+ o Section definition ([somedirname]).
+
+ Depending on the type of configuration file, section definitions either
+ separate groups of parameters or allow redefining some parameters for a
+ directory sub-tree. They stay in effect until another section definition,
+ or the end of file, is encountered. Some of the parameters used for
+ indexing are looked up hierarchically from the current directory location
+ upwards. Not all parameters can be meaningfully redefined, this is
+ specified for each in the next section.
+
+ When found at the beginning of a file path, the tilde character (~) is
+ expanded to the name of the user's home directory, as a shell would do.
+
+ White space is used for separation inside lists. List elements with
+ embedded spaces can be quoted using double-quotes.
+
+ Encoding issues. Most of the configuration parameters are plain ASCII. Two
+ particular sets of values may cause encoding issues:
+
+ o File path parameters may contain non-ascii characters and should use
+ the exact same byte values as found in the file system directory.
+ Usually, this means that the configuration file should use the system
+ default locale encoding.
+
+ o The unac_except_trans parameter should be encoded in UTF-8. If your
+ system locale is not UTF-8, and you need to also specify non-ascii
+ file paths, this poses a difficulty because common text editors cannot
+ handle multiple encodings in a single file. In this relatively
+ unlikely case, you can edit the configuration file as two separate
+ text files with appropriate encodings, and concatenate them to create
+ the complete configuration.
+
+ 5.4.1. Main configuration file
+
+ recoll.conf is the main configuration file. It defines things like what to
+ index (top directories and things to ignore), and the default character
+ set to use for document types which do not specify it internally.
+
+ The default configuration will index your home directory. If this is not
+ appropriate, start recoll to create a blank configuration, click Cancel,
+ and edit the configuration file before restarting the command. This will
+ start the initial indexing, which may take some time.
+
+ Most of the following parameters can be changed from the Index
+ Configuration menu in the recoll interface. Some can only be set by
+ editing the configuration file.
+
+ 5.4.1.1. Parameters affecting what documents we index:
+
+ topdirs
+
+ Specifies the list of directories or files to index (recursively
+ for directories). You can use symbolic links as elements of this
+ list. See the followLinks option about following symbolic links
+ found under the top elements (not followed by default).
+
+ skippedNames
+
+ A space-separated list of patterns for names of files or
+ directories that should be completely ignored. The list defined in
+ the default file is:
+
+ skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
+ *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
+ .recoll* xapiandb recollrc recoll.conf
+
+ The list can be redefined at any sub-directory in the indexed
+ area.
+
+ The top-level directories are not affected by this list (that is,
+ a directory in topdirs might match and would still be indexed).
+
+ The list in the default configuration does not exclude hidden
+ directories (names beginning with a dot), which means that it may
+ index quite a few things that you do not want. On the other hand,
+ email user agents like thunderbird usually store messages in
+ hidden directories, and you probably want this indexed. One
+ possible solution is to have .* in skippedNames, and add things
+ like ~/.thunderbird or ~/.evolution in topdirs.
+
+ Not even the file names are indexed for patterns in this list. See
+ the recoll_noindex variable in mimemap for an alternative approach
+ which indexes the file names.
+
+ skippedPaths and daemSkippedPaths
+
+ A space-separated list of patterns for paths of files or
+ directories that should be skipped. There is no default in the
+ sample configuration file, but the code always adds the
+ configuration and database directories in there.
+
+ skippedPaths is used both by batch and real time indexing.
+ daemSkippedPaths can be used to specify things that should be
+ indexed at startup, but not monitored.
+
+ Example of use for skipping text files only in a specific
+ directory:
+
+ skippedPaths = ~/somedir/*.txt
+
+
+ skippedPathsFnmPathname
+
+ The values in the *skippedPaths variables are matched by default
+ with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags.
+ This means that '/' characters must be matched explicitely. You
+ can set skippedPathsFnmPathname to 0 to disable the use of
+ FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
+
+ followLinks
+
+ Specifies if the indexer should follow symbolic links while
+ walking the file tree. The default is to ignore symbolic links to
+ avoid multiple indexing of linked files. No effort is made to
+ avoid duplication when this option is set to true. This option can
+ be set individually for each of the topdirs members by using
+ sections. It can not be changed below the topdirs level.
+
+ indexedmimetypes
+
+ Recoll normally indexes any file which it knows how to read. This
+ list lets you restrict the indexed mime types to what you specify.
+ If the variable is unspecified or the list empty (the default),
+ all supported types are processed.
+
+ compressedfilemaxkbs
+
+ Size limit for compressed (.gz or .bz2) files. These need to be
+ decompressed in a temporary directory for identification, which
+ can be very wasteful if 'uninteresting' big compressed files are
+ present. Negative means no limit, 0 means no processing of any
+ compressed file. Defaults to -1.
+
+ textfilemaxmbs
+
+ Maximum size for text files. Very big text files are often
+ uninteresting logs. Set to -1 to disable (default 20MB).
+
+ textfilepagekbs
+
+ If set to other than -1, text files will be indexed as multiple
+ documents of the given page size. This may be useful if you do
+ want to index very big text files as it will both reduce memory
+ usage at index time and help with loading data to the preview
+ window. A size of a few megabytes would seem reasonable (default:
+ 1MB).
+
+ membermaxkbs
+
+ This defines the maximum size in kilobytes for an archive member
+ (zip, tar or rar at the moment). Bigger entries will be skipped.
+
+ indexallfilenames
+
+ Recoll indexes file names in a special section of the database to
+ allow specific file names searches using wild cards. This
+ parameter decides if file name indexing is performed only for
+ files with mime types that would qualify them for full text
+ indexing, or for all files inside the selected subtrees,
+ independently of mime type.
+
+ usesystemfilecommand
+
+ Decide if we use the file -i system command as a final step for
+ determining the mime type for a file (the main procedure uses
+ suffix associations as defined in the mimemap file). This can be
+ useful for files with suffix-less names, but it will also cause
+ the indexing of many bogus "text" files.
+
+ processwebqueue
+
+ If this is set, process the directory where Web browser plugins
+ copy visited pages for indexing.
+
+ webqueuedir
+
+ The path to the web indexing queue. This is hard-coded in the
+ Firefox plugin as ~/.recollweb/ToIndex so there should be no need
+ to change it.
+
+ 5.4.1.2. Parameters affecting how we generate terms:
+
+ Changing some of these parameters will imply a full reindex. Also, when
+ using multiple indexes, it may not make sense to search indexes that don't
+ share the values for these parameters, because they usually affect both
+ search and index operations.
+
+ indexStripChars
+
+ Decide if we strip characters of diacritics and convert them to
+ lower-case before terms are indexed. If we don't, searches
+ sensitive to case and diacritics can be performed, but the index
+ will be bigger, and some marginal weirdness may sometimes occur.
+ The default is a stripped index (indexStripChars = 1) for now.
+ When using multiple indexes for a search, this parameter must be
+ defined identically for all. Changing the value implies an index
+ reset.
+
+ maxTermExpand
+
+ Maximum expansion count for a single term (e.g.: when using
+ wildcards). The default of 10000 is reasonable and will avoid
+ queries that appear frozen while the engine is walking the term
+ list.
+
+ maxXapianClauses
+
+ Maximum number of elementary clauses we can add to a single Xapian
+ query. In some cases, the result of term expansion can be
+ multiplicative, and we want to avoid using excessive memory. The
+ default of 100 000 should be both high enough in most cases and
+ compatible with current typical hardware configurations.
+
+ nonumbers
+
+ If this set to true, no terms will be generated for numbers. For
+ example "123", "1.5e6", 192.168.1.4, would not be indexed
+ ("value123" would still be). Numbers are often quite interesting
+ to search for, and this should probably not be set except for
+ special situations, ie, scientific documents with huge amounts of
+ numbers in them. This can only be set for a whole index, not for a
+ subtree.
+
+ nocjk
+
+ If this set to true, specific east asian (Chinese Korean Japanese)
+ characters/word splitting is turned off. This will save a small
+ amount of cpu if you have no CJK documents. If your document base
+ does include such text but you are not interested in searching it,
+ setting nocjk may be a significant time and space saver.
+
+ cjkngramlen
+
+ This lets you adjust the size of n-grams used for indexing CJK
+ text. The default value of 2 is probably appropriate in most
+ cases. A value of 3 would allow more precision and efficiency on
+ longer words, but the index will be approximately twice as large.
+
+ indexstemminglanguages
+
+ A list of languages for which the stem expansion databases will be
+ built. See recollindex(1) or use the recollindex -l command for
+ possible values. You can add a stem expansion database for a
+ different language by using recollindex -s, but it will be deleted
+ during the next indexing. Only languages listed in the
+ configuration file are permanent.
+
+ defaultcharset
+
+ The name of the character set used for files that do not contain a
+ character set definition (ie: plain text files). This can be
+ redefined for any sub-directory. If it is not set at all, the
+ character set used is the one defined by the nls environment (
+ LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
+
+ unac_except_trans
+
+ This is a list of characters, encoded in UTF-8, which should be
+ handled specially when converting text to unaccented lowercase.
+ For example, in Swedish, the letter a with diaeresis has full
+ alphabet citizenship and should not be turned into an a. Each
+ element in the space-separated list has the special character as
+ first element and the translation following. The handling of both
+ the lowercase and upper-case versions of a character should be
+ specified, as appartenance to the list will turn-off both standard
+ accent and case processing. Example for Swedish:
+
+ unac_except_trans = aaaa AAaa a:a: A:a: o:o: O:o:
+
+
+ Note that the translation is not limited to a single character,
+ you could very well have something like u:ue in the list.
+
+ The default value set for unac_except_trans can't be listed here
+ because I have trouble with SGML and UTF-8, but it only contains
+ ligature decompositions: german ss, oe, ae, fi, fl.
+
+ This parameter can't be defined for subdirectories, it is global,
+ because there is no way to do otherwise when querying. If you have
+ document sets which would need different values, you will have to
+ index and query them separately.
+
+ maildefcharset
+
+ This can be used to define the default character set specifically
+ for email messages which don't specify it. This is mainly useful
+ for readpst (libpst) dumps, which are utf-8 but do not say so.
+
+ localfields
+
+ This allows setting fields for all documents under a given
+ directory. Typical usage would be to set an "rclaptg" field, to be
+ used in mimeview to select a specific viewer. If several fields
+ are to be set, they should be separated with a colon (':')
+ character (which there is currently no way to escape). Ie:
+ localfields= rclaptg=gnus:other = val, then select specifier
+ viewer with mimetype|tag=... in mimeview.
+
+ 5.4.1.3. Parameters affecting where and how we store things:
+
+ dbdir
+
+ The name of the Xapian data directory. It will be created if
+ needed when the index is initialized. If this is not an absolute
+ path, it will be interpreted relative to the configuration
+ directory. The value can have embedded spaces but starting or
+ trailing spaces will be trimmed. You cannot use quotes here.
+
+ idxstatusfile
+
+ The name of the scratch file where the indexer process updates its
+ status. Default: idxstatus.txt inside the configuration directory.
+
+ maxfsoccuppc
+
+ Maximum file system occupation before we stop indexing. The value
+ is a percentage, corresponding to what the "Capacity" df output
+ column shows. The default value is 0, meaning no checking.
+
+ mboxcachedir
+
+ The directory where mbox message offsets cache files are held.
+ This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
+ to share a directory between different configurations.
+
+ mboxcacheminmbs
+
+ The minimum mbox file size over which we cache the offsets. There
+ is really no sense in caching offsets for small files. The default
+ is 5 MB.
+
+ webcachedir
+
+ This is only used by the web browser plugin indexing code, and
+ defines where the cache for visited pages will live. Default:
+ $RECOLL_CONFDIR/webcache
+
+ webcachemaxmbs
+
+ This is only used by the web browser plugin indexing code, and
+ defines the maximum size for the web page cache. Default: 40 MB.
+
+ idxflushmb
+
+ Threshold (megabytes of new text data) where we flush from memory
+ to disk index. Setting this can help control memory usage. A value
+ of 0 means no explicit flushing, letting Xapian use its own
+ default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
+ documents, which gives little memory usage control, as memory
+ usage also depends on average document size. The default value is
+ 10, and it is probably a bit low. If your system usually has free
+ memory, you can try higher values between 20 and 80. In my
+ experience, values beyond 100 are always counterproductive.
+
+ 5.4.1.4. Miscellaneous parameters:
+
+ autodiacsens
+
+ IF the index is not stripped, decide if we automatically trigger
+ diacritics sensitivity if the search term has accented characters
+ (not in unac_except_trans). Else you need to use the query
+ language and the D modifier to specify diacritics sensitivity.
+ Default is no.
+
+ autocasesens
+
+ IF the index is not stripped, decide if we automatically trigger
+ character case sensitivity if the search term has upper-case
+ characters in any but the first position. Else you need to use the
+ query language and the C modifier to specify character-case
+ sensitivity. Default is yes.
+
+ loglevel,daemloglevel
+
+ Verbosity level for recoll and recollindex. A value of 4 lists
+ quite a lot of debug/information messages. 2 only lists errors.
+ The daemversion is specific to the indexing monitor daemon.
+
+ logfilename, daemlogfilename
+
+ Where the messages should go. 'stderr' can be used as a special
+ value, and is the default. The daemversion is specific to the
+ indexing monitor daemon.
+
+ mondelaypatterns
+
+ This allows specify wildcard path patterns (processed with
+ fnmatch(3) with 0 flag), to match files which change too often and
+ for which a delay should be observed before re-indexing. This is a
+ space-separated list, each entry being a pattern and a time in
+ seconds, separated by a colon. You can use double quotes if a path
+ entry contains white space. Example:
+
+ mondelaypatterns = *.log:20 "this one has spaces*:10"
+
+
+ monixinterval
+
+ Minimum interval (seconds) for processing the indexing queue. The
+ real time monitor does not process each event when it comes in,
+ but will wait this time for the queue to accumulate to diminish
+ overhead and in order to aggregate multiple events to the same
+ file. Default 30 S.
+
+ monauxinterval
+
+ Period (in seconds) at which the real time monitor will regenerate
+ the auxiliary databases (spelling, stemming) if needed. The
+ default is one hour.
+
+ monioniceclass, monioniceclassdata
+
+ These allow defining the ionice class and data used by the indexer
+ (default class 3, no data).
+
+ filtermaxseconds
+
+ Maximum filter execution time, after which it is aborted. Some
+ postscript programs just loop...
+
+ filtersdir
+
+ A directory to search for the external filter scripts used to
+ index some types of files. The value should not be changed, except
+ if you want to modify one of the default scripts. The value can be
+ redefined for any sub-directory.
+
+ iconsdir
+
+ The name of the directory where recoll result list icons are
+ stored. You can change this if you want different images.
+
+ idxabsmlen
+
+ Recoll stores an abstract for each indexed file inside the
+ database. The text can come from an actual 'abstract' section in
+ the document or will just be the beginning of the document. It is
+ stored in the index so that it can be displayed inside the result
+ lists without decoding the original file. The idxabsmlen parameter
+ defines the size of the stored abstract. The default value is 250
+ bytes. The search interface gives you the choice to display this
+ stored text or a synthetic abstract built by extracting text
+ around the search terms. If you always prefer the synthetic
+ abstract, you can reduce this value and save a little space.
+
+ aspellLanguage
+
+ Language definitions to use when creating the aspell dictionary.
+ The value must match a set of aspell language definition files.
+ You can type "aspell config" to see where these are installed
+ (look for data-dir). The default if the variable is not set is to
+ use your desktop national language environment to guess the value.
+
+ noaspell
+
+ If this is set, the aspell dictionary generation is turned off.
+ Useful for cases where you don't need the functionality or when it
+ is unusable because aspell crashes during dictionary generation.
+
+ mhmboxquirks
+
+ This allows definining location-related quirks for the mailbox
+ handler. Currently only the tbird flag is defined, and it should
+ be set for directories which hold Thunderbird data, as their
+ folder format is weird.
+
+ 5.4.2. The fields file
+
+ This file contains information about dynamic fields handling in Recoll.
+ Some very basic fields have hard-wired behaviour, and, mostly, you should
+ not change the original data inside the fields file. But you can create
+ custom fields fitting your data and handle them just like they were native
+ ones.
+
+ The fields file has several sections, which each define an aspect of
+ fields processing. Quite often, you'll have to modify several sections to
+ obtain the desired behaviour.
+
+ We will only give a short description here, you should refer to the
+ comments inside the file for more detailed information.
+
+ Field names should be lowercase alphabetic ASCII.
+
+ [prefixes]
+
+ A field becomes indexed (searchable) by having a prefix defined in
+ this section.
+
+ [stored]
+
+ A field becomes stored (displayable inside results) by having its
+ name listed in this section (typically with an empty value).
+
+ [aliases]
+
+ This section defines lists of synonyms for the canonical names
+ used inside the [prefixes] and [stored] sections
+
+ filter-specific sections
+
+ Some filters may need specific configuration for handling fields.
+ Only the email message filter currently has such a section (named
+ [mail]). It allows indexing arbitrary email headers in addition to
+ the ones indexed by default. Other such sections may appear in the
+ future.
+
+ Here follows a small example of a personal fields file. This would extract
+ a specific email header and use it as a searchable field, with data
+ displayable inside result lists. (Side note: as the email filter does no
+ decoding on the values, only plain ascii headers can be indexed, and only
+ the first occurrence will be used for headers that occur several times).
+
+ [prefixes]
+ # Index mailmytag contents (with the given prefix)
+ mailmytag = XMTAG
+
+ [stored]
+ # Store mailmytag inside the document data record (so that it can be
+ # displayed - as %(mailmytag) - in result lists).
+ mailmytag =
+
+ [mail]
+ # Extract the X-My-Tag mail header, and use it internally with the
+ # mailmytag field name
+ x-my-tag = mailmytag
+
+ 5.4.3. The mimemap file
+
+ mimemap specifies the file name extension to mime type mappings.
+
+ For file names without an extension, or with an unknown one, the system's
+ file -i command will be executed to determine the mime type (this can be
+ switched off inside the main configuration file).
+
+ The mappings can be specified on a per-subtree basis, which may be useful
+ in some cases. Example: gaim logs have a .txt extension but should be
+ handled specially, which is possible because they are usually all located
+ in one place.
+
+ mimemap also has a recoll_noindex variable which is a list of suffixes.
+ Matching files will be skipped (which avoids unnecessary decompressions or
+ file executions). This is partially redundant with skippedNames in the
+ main configuration file, with a few differences: it will not affect
+ directories, it cannot be made dependant on the file-system location (it
+ is a configuration-wide parameter), and the file names will still be
+ indexed (not even the file names are indexed for patterns in skippedNames.
+ recoll_noindex is used mostly for things known to be unindexable by a
+ given Recoll version. Having it there avoids cluttering the more
+ user-oriented and locally customized skippedNames.
+
+ 5.4.4. The mimeconf file
+
+ mimeconf specifies how the different mime types are handled for indexing,
+ and which icons are displayed in the recoll result lists.
+
+ Changing the parameters in the [index] section is probably not a good idea
+ except if you are a Recoll developer.
+
+ The [icons] section allows you to change the icons which are displayed by
+ recoll in the result lists (the values are the basenames of the png images
+ inside the iconsdir directory (specified in recoll.conf).
+
+ 5.4.5. The mimeview file
+
+ mimeview specifies which programs are started when you click on an Open
+ link in a result list. Ie: HTML is normally displayed using firefox, but
+ you may prefer Konqueror, your openoffice.org program might be named
+ oofice instead of openoffice etc.
+
+ Changes to this file can be done by direct editing, or through the recoll
+ GUI preferences dialog.
+
+ If Use desktop preferences to choose document editor is checked in the
+ Recoll GUI preferences, all mimeview entries will be ignored except the
+ one labelled application/x-all (which is set to use xdg-open by default).
+
+ In this case, the xallexcepts top level variable defines a list of mime
+ type exceptions which will be processed according to the local entries
+ instead of being passed to the desktop. This is so that specific Recoll
+ options such as a page number or a search string can be passed to
+ applications that support them, such as the evince viewer.
+
+ As for the other configuration files, the normal usage is to have a
+ mimeview inside your own configuration directory, with just the
+ non-default entries, which will override those from the central
+ configuration file.
+
+ All viewer definition entries must be placed under a [view] section.
+
+ The keys in the file are normally mime types. You can add an application
+ tag to specialize the choice for an area of the filesystem (using a
+ localfields specification in mimeconf). The syntax for the key is
+ mimetype|tag
+
+ The nouncompforviewmts entry, (placed at the top level, outside of the
+ [view] section), holds a list of mime types that should not be
+ uncompressed before starting the viewer (if they are found compressed, ie:
+ mydoc.doc.gz).
+
+ The right side of each assignment holds a command to be executed for
+ opening the file. The following substitutions are performed:
+
+ o %D. Document date
+
+ o %f. File name. This may be the name of a temporary file if it was
+ necessary to create one (ie: to extract a subdocument from a
+ container).
+
+ o %F. Original file name. Same as %f except if a temporary file is used.
+
+ o %i. Internal path, for subdocuments of containers. The format depends
+ on the container type. If this appears in the command line, Recoll
+ will not create a temporary file to extract the subdocument, expecting
+ the called application (possibly a script) to be able to handle it.
+
+ o %M. Mime type
+
+ o %p. Page index. Only significant for a subset of document types,
+ currently only PDF, Postscript and DVI files. Can be used to start the
+ editor at the right page for a match or snippet.
+
+ o %s. Search term. The value will only be set for documents with indexed
+ page numbers (ie: PDF). The value will be one of the matched search
+ terms. It would allow pre-setting the value in the "Find" entry inside
+ Evince for example, for easy highlighting of the term.
+
+ o %U, %u. Url.
+
+ In addition to the predefined values above, all strings like %(fieldname)
+ will be replaced by the value of the field named fieldname for the
+ document. This could be used in combination with field customisation to
+ help with opening the document.
+
+ 5.4.6. Examples of configuration adjustments
+
+ 5.4.6.1. Adding an external viewer for an non-indexed type
+
+ Imagine that you have some kind of file which does not have indexable
+ content, but for which you would like to have a functional Open link in
+ the result list (when found by file name). The file names end in .blob and
+ can be displayed by application blobviewer.
+
+ You need two entries in the configuration files for this to work:
+
+ o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
+ following line:
+
+ .blob = application/x-blobapp
+
+ Note that the mime type is made up here, and you could call it
+ diesel/oil just the same.
+
+ o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
+
+ application/x-blobapp = blobviewer %f
+
+ We are supposing that blobviewer wants a file name parameter here, you
+ would use %u if it liked URLs better.
+
+ If you just wanted to change the application used by Recoll to display a
+ mime type which it already knows, you would just need to edit mimeview.
+ The entries you add in your personal file override those in the central
+ configuration, which you do not need to alter. mimeview can also be
+ modified from the Gui.
+
+ 5.4.6.2. Adding indexing support for a new file type
+
+ Let us now imagine that the above .blob files actually contain indexable
+ text and that you know how to extract it with a command line program.
+ Getting Recoll to index the files is easy. You need to perform the above
+ alteration, and also to add data to the mimeconf file (typically in
+ ~/.recoll/mimeconf):
+
+ o Under the [index] section, add the following line (more about the
+ rclblob indexing script later):
+
+ application/x-blobapp = exec rclblob
+
+ o Under the [icons] section, you should choose an icon to be displayed
+ for the files inside the result lists. Icons are normally 64x64 pixels
+ PNG files which live in /usr/[local/]share/recoll/images.
+
+ o Under the [categories] section, you should add the mime type where it
+ makes sense (you can also create a category). Categories may be used
+ for filtering in advanced search.
+
+ The rclblob filter should be an executable program or script which exists
+ inside /usr/[local/]share/recoll/filters. It will be given a file name as
+ argument and should output the text or html contents on the standard
+ output.
+
+ The filter programming section describes in more detail how to write a
+ filter.
+
+ ----------------------------------------------------------------------
+
+ Prev Up
+ 5.3. Building from source Home