--- a/src/INSTALL
+++ b/src/INSTALL
@@ -2,1076 +2,3 @@
More documentation can be found in the doc/ directory or at http://www.recoll.org
- Link: HOME
- Link: PREVIOUS
- Link: NEXT
-
- Recoll user manual
- Prev Next
-
- --------------------------------------------------------------------------
-
- Chapter 5. Installation and configuration
-
- Table of Contents
-
- 5.1. Installing a binary copy
-
- 5.2. Supporting packages
-
- 5.3. Building from source
-
- 5.4. Configuration overview
-
- 5.1. Installing a binary copy
-
- There are three types of binary Recoll installations:
-
- * Through your system normal software distribution framework (ie,
- Debian/Ubuntu apt, FreeBSD ports, etc.).
-
- * From a package downloaded from the Recoll web site.
-
- * From a prebuilt tree downloaded from the Recoll web site.
-
- In all cases, the strict software dependancies (ie on Xapian or iconv)
- will be automatically satisfied, you should not have to worry about them.
-
- You will only have to check or install supporting applications for the
- file types that you want to index beyond those that are natively processed
- by Recoll (text, HTML, email files, and a few others).
-
- You should also maybe have a look at the configuration section (but this
- may not be necessary for a quick test with default parameters). Most
- parameters can be more conveniently set from the GUI interface.
-
-5.1.1. Installing through a package system
-
- If you use a BSD-type port system or a prebuilt package (DEB, RPM,
- manually or through the system software configuration utility), just
- follow the usual procedure for your system.
-
-5.1.2. Installing a prebuilt Recoll
-
- The unpackaged binary versions on the Recoll web site are just compressed
- tar files of a build tree, where only the useful parts were kept
- (executables and sample configuration).
-
- The executable binary files are built with a static link to libxapian and
- libiconv, to make installation easier (no dependencies).
-
- After extracting the tar file, you can proceed with installation as if you
- had built the package from source (that is, just type make install). The
- binary trees are built for installation to /usr/local.
-
- --------------------------------------------------------------------------
-
- Prev Home Next
- API Supporting packages
- Link: HOME
- Link: UP
- Link: PREVIOUS
- Link: NEXT
-
- Recoll user manual
- Prev Chapter 5. Installation and configuration Next
-
- --------------------------------------------------------------------------
-
- 5.2. Supporting packages
-
- Recoll uses external applications to index some file types. You need to
- install them for the file types that you wish to have indexed (these are
- run-time optional dependencies. None is needed for building or running
- Recoll except for indexing their specific file type).
-
- After an indexing pass, the commands that were found missing can be
- displayed from the recoll File menu. The list is stored in the missing
- text file inside the configuration directory.
-
- A list of common file types which need external commands follows. Many of
- the filters need the iconv command, which is not always listed as a
- dependancy.
-
- Please note that, due to the relatively dynamic nature of this
- information, the most up to date version is now kept on the Recoll helper
- applications page along with links to the home pages or best
- source/patches pages, and misc tips. The list below is not updated often
- and may be quite stale.
-
- For many Linux distributions, most of the commands listed can be installed
- from the package repositories. However, the packages are sometimes
- outdated, or not the best version for Recoll, so you should take a look at
- the Recoll helper applications page if a file type is important to you.
-
- As of Recoll release 1.14, a number of XML-based formats that were handled
- by ad hoc filter code now use the xsltproc command, which usually comes
- with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
-
- Now for the list:
-
- * Openoffice files need unzip and xsltproc.
-
- * PDF files need pdftotext which is part of the Xpdf or Poppler
- packages.
-
- * Postscript files need pstotext. The original version has an issue with
- shell character in file names, which is corrected in recent packages.
- See the the Recoll helper applications page for more detail.
-
- * MS Word needs antiword. It is also useful to have wvWare installed as
- it may be be used as a fallback for some files which antiword does not
- handle.
-
- * MS Excel and PowerPoint need catdoc.
-
- * MS Open XML (docx) needs xsltproc.
-
- * Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
- Ubuntu) package.
-
- * RTF files need unrtf, which, in its standard version, has much trouble
- with non-western character sets. Check the Recoll helper applications
- page.
-
- * TeX files need untex or detex. Check the Recoll helper applications
- page for sources if it's not packaged for your distribution.
-
- * dvi files need dvips.
-
- * djvu files need djvutxt and djvused from the DjVuLibre package.
-
- * Audio files: Recoll releases before 1.13 used the id3info command from
- the id3lib package to extract mp3 tag information, metaflac (standard
- flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
- Releases 1.14 and later use a single Python filter based on mutagen
- for all audio file types.
-
- * Pictures: Recoll uses the Exiftool Perl package to extract tag
- information. Most image file formats are supported. Note that there
- may not be much interest in indexing the technical tags (image size,
- aperture, etc.). This is only of interest if you store personal tags
- or textual descriptions inside the image files.
-
- * chm: files in microsoft help format need Python and the pychm module
- (which needs chmlib).
-
- * ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
- module. icalendar is not needed for newer versions, which use internal
- code.
-
- * Zip archives need Python (and the standard zipfile module).
-
- * Rar archives need Python, the rarfile Python module and the unrar
- utility.
-
- * Midi karaoke files need Python and the Midi module
-
- * Konqueror webarchive format with Python (uses the Tarfile module).
-
- * mimehtml web archive format (support based on the email filter, which
- introduces some mild weirdness, but still usable).
-
- Text, HTML, email folders, and Scribus files are processed internally. Lyx
- is used to index Lyx files. Many filters need iconv and the standard sed
- and awk.
-
- --------------------------------------------------------------------------
-
- Prev Home Next
- Installation and configuration Up Building from source
- Link: HOME
- Link: UP
- Link: PREVIOUS
- Link: NEXT
-
- Recoll user manual
- Prev Chapter 5. Installation and configuration Next
-
- --------------------------------------------------------------------------
-
- 5.3. Building from source
-
-5.3.1. Prerequisites
-
- C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
- itself by strange messages about a missing iconv_open.
-
- Development files for Xapian core.
-
- Important: If you are building Xapian for an older CPU (before Pentium 4
- or Athlon 64), you need to add the --disable-sse flag to the configure
- command. Else all Xapian application will crash with an illegal
- instruction error.
-
- Development files for Qt .
-
- Development files for X11 and zlib.
-
- Check the Recoll download page for up to date version information.
-
- You will most probably be able to find a binary package for Qt for your
- system. You may have to compile Xapian but this is not difficult (if you
- are using FreeBSD, there is a port).
-
- You may also need libiconv. Recoll currently uses version 1.9 (this should
- not be critical). On Linux systems, the iconv interface is part of libc
- and you should not need to do anything special.
-
-5.3.2. Building
-
- Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
- versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
- ok). If you build on another system, and need to modify things, I would
- very much welcome patches.
-
- Depending on the Qt 3 configuration on your system, you may have to set
- the QTDIR and QMAKESPECS variables in your environment:
-
- * QTDIR should point to the directory above the one that holds the qt
- include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
- be /usr/local/qt).
-
- * QMAKESPECS should be set to the name of one of the Qt mkspecs
- sub-directories (ie: linux-g++).
-
- On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
- is not needed because there is a default link in mkspecs/.
-
- Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration
- details are entirely determined by qmake (which is quite often installed
- as qmake-qt4).
-
- Configure options:
-
- * --without-aspell will disable the code for phonetic matching of search
- terms.
-
- * --with-fam or --with-inotify will enable the code for real time
- indexing. Inotify support is enabled by default on recent Linux
- systems.
-
- * --disable-webkit is available from version 1.17 to implement the
- result list with a Qt QTextBrowser instead of a WebKit widget if you
- do not or can't depend on the latter.
-
- * --enable-xattr will enable code to fetch data from file extended
- attributes. This is only useful is some application stores data in
- there, and also needs some simple configuration (see comments in the
- fields configuration file).
-
- * --enable-camelcase will enable splitting camelCase words. This is not
- enabled by default as it has the unfortunate side-effect of making
- some phrase searches quite confusing: ie, "MySQL manual" would be
- matched by "MySQL manual" and "my sql manual" but not "mysql manual"
- (only inside phrase searches).
-
- * --with-file-command Specify the version of the 'file' command to use
- (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
- the gnu version on systems where the native one is bad.
-
- * --disable-qtgui Disable the Qt interface. Will allow building the
- indexer and the command line search program in absence of a Qt
- environment.
-
- * --disable-x11mon Disable X11 connection monitoring inside recollindex.
- Together with --disable-qtgui, this allows building recoll without Qt
- and X11.
-
- * Of course the usual autoconf configure options, like --prefix apply.
-
- Normal procedure:
-
- cd recoll-xxx
- configure
- make
- (practices usual hardship-repelling invocations)
-
-
- There is little auto-configuration. The configure script will mainly link
- one of the system-specific files in the mk directory to mk/sysconf. If
- your system is not known yet, it will tell you as much, and you may want
- to manually copy and modify one of the existing files (the new file name
- should be the output of uname -s).
-
-5.3.3. Installation
-
- Either type make install or execute recollinstall prefix, in the root of
- the source tree. This will copy the commands to prefix/bin and the sample
- configuration files, scripts and other shared data to prefix/share/recoll.
-
- If the installation prefix given to recollinstall is different from either
- the system default or the value which was specified when executing
- configure (as in configure --prefix /some/path), you will have to set the
- RECOLL_DATADIR environment variable to indicate where the shared data is
- to be found (ie for (ba)sh: export
- RECOLL_DATADIR=/some/path/share/recoll).
-
- You can then proceed to configuration.
-
- --------------------------------------------------------------------------
-
- Prev Home Next
- Supporting packages Up Configuration overview
- Link: HOME
- Link: UP
- Link: PREVIOUS
-
- Recoll user manual
- Prev Chapter 5. Installation and configuration
-
- --------------------------------------------------------------------------
-
- 5.4. Configuration overview
-
- Most of the parameters specific to the recoll GUI are set through the
- Preferences menu and stored in the standard Qt place
- ($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
- this by hand.
-
- Recoll indexing options are set inside text configuration files located in
- a configuration directory. There can be several such directories, each of
- which define the parameters for one index.
-
- The configuration files can be edited by hand or through the Index
- configuration dialog (Preferences menu). The GUI tool will try to respect
- your formatting and comments as much as possible, so it is quite possible
- to use both ways.
-
- The most accurate documentation for the configuration parameters is given
- by comments inside the default files, and we will just give a general
- overview here.
-
- For each index, there are two sets of configuration files. System-wide
- configuration files are kept in a directory named like
- /usr/[local/]share/recoll/examples, and define default values, shared by
- all indexes. For each index, a parallel set of files defines the
- customized parameters.
-
- The default location of the configuration is the .recoll directory in your
- home. Most people will only use this directory.
-
- This location can be changed, or others can be added with the
- RECOLL_CONFDIR environment variable or the -c option parameter to recoll
- and recollindex.
-
- If the .recoll directory does not exist when recoll or recollindex are
- started, it will be created with a set of empty configuration files.
- recoll will give you a chance to edit the configuration file before
- starting indexing. recollindex will proceed immediately. To avoid
- mistakes, the automatic directory creation will only occur for the default
- location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
- will have to create the directory).
-
- All configuration files share the same format. For example, a short
- extract of the main configuration file might look as follows:
-
- # Space-separated list of directories to index.
- topdirs = ~/docs /usr/share/doc
-
- [~/somedirectory-with-utf8-txt-files]
- defaultcharset = utf-8
-
-
- There are three kinds of lines:
-
- * Comment (starts with #) or empty.
-
- * Parameter affectation (name = value).
-
- * Section definition ([somedirname]).
-
- Depending on the type of configuration file, section definitions either
- separate groups of parameters or allow redefining some parameters for a
- directory sub-tree. They stay in effect until another section definition,
- or the end of file, is encountered. Some of the parameters used for
- indexing are looked up hierarchically from the current directory location
- upwards. Not all parameters can be meaningfully redefined, this is
- specified for each in the next section.
-
- When found at the beginning of a file path, the tilde character (~) is
- expanded to the name of the user's home directory, as a shell would do.
-
- White space is used for separation inside lists. List elements with
- embedded spaces can be quoted using double-quotes.
-
- Encoding issues. Most of the configuration parameters are plain ASCII. Two
- particular sets of values may cause encoding issues:
-
- * File path parameters may contain non-ascii characters and should use
- the exact same byte values as found in the file system directory.
- Usually, this means that the configuration file should use the system
- default locale encoding.
-
- * The unac_except_trans parameter should be encoded in UTF-8. If your
- system locale is not UTF-8, and you need to also specify non-ascii
- file paths, this poses a difficulty because common text editors cannot
- handle multiple encodings in a single file. In this relatively
- unlikely case, you can edit the configuration file as two separate
- text files with appropriate encodings, and concatenate them to create
- the complete configuration.
-
-5.4.1. Main configuration file
-
- recoll.conf is the main configuration file. It defines things like what to
- index (top directories and things to ignore), and the default character
- set to use for document types which do not specify it internally.
-
- The default configuration will index your home directory. If this is not
- appropriate, start recoll to create a blank configuration, click Cancel,
- and edit the configuration file before restarting the command. This will
- start the initial indexing, which may take some time.
-
- Most of the following parameters can be changed from the Index
- Configuration menu in the recoll interface. Some can only be set by
- editing the configuration file.
-
- 5.4.1.1. Parameters affecting what documents we index:
-
- topdirs
-
- Specifies the list of directories or files to index (recursively
- for directories). You can use symbolic links as elements of this
- list. See the followLinks option about following symbolic links
- found under the top elements (not followed by default).
-
- skippedNames
-
- A space-separated list of patterns for names of files or
- directories that should be completely ignored. The list defined in
- the default file is:
-
- skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
- *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
- .recoll* xapiandb recollrc recoll.conf
-
- The list can be redefined at any sub-directory in the indexed
- area.
-
- The top-level directories are not affected by this list (that is,
- a directory in topdirs might match and would still be indexed).
-
- The list in the default configuration does not exclude hidden
- directories (names beginning with a dot), which means that it may
- index quite a few things that you do not want. On the other hand,
- email user agents like thunderbird usually store messages in
- hidden directories, and you probably want this indexed. One
- possible solution is to have .* in skippedNames, and add things
- like ~/.thunderbird or ~/.evolution in topdirs.
-
- Not even the file names are indexed for patterns in this list. See
- the recoll_noindex variable in mimemap for an alternative approach
- which indexes the file names.
-
- skippedPaths and daemSkippedPaths
-
- A space-separated list of patterns for paths of files or
- directories that should be skipped. There is no default in the
- sample configuration file, but the code always adds the
- configuration and database directories in there.
-
- skippedPaths is used both by batch and real time indexing.
- daemSkippedPaths can be used to specify things that should be
- indexed at startup, but not monitored.
-
- Example of use for skipping text files only in a specific
- directory:
-
- skippedPaths = ~/somedir/..txt
-
-
- skippedPathsFnmPathname
-
- The values in the *skippedPaths variables are matched by default
- with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags.
- This means that '/' characters must be matched explicitely. You
- can set skippedPathsFnmPathname to 0 to disable the use of
- FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
-
- followLinks
-
- Specifies if the indexer should follow symbolic links while
- walking the file tree. The default is to ignore symbolic links to
- avoid multiple indexing of linked files. No effort is made to
- avoid duplication when this option is set to true. This option can
- be set individually for each of the topdirs members by using
- sections. It can not be changed below the topdirs level.
-
- indexedmimetypes
-
- Recoll normally indexes any file which it knows how to read. This
- list lets you restrict the indexed mime types to what you specify.
- If the variable is unspecified or the list empty (the default),
- all supported types are processed.
-
- compressedfilemaxkbs
-
- Size limit for compressed (.gz or .bz2) files. These need to be
- decompressed in a temporary directory for identification, which
- can be very wasteful if 'uninteresting' big compressed files are
- present. Negative means no limit, 0 means no processing of any
- compressed file. Defaults to -1.
-
- textfilemaxmbs
-
- Maximum size for text files. Very big text files are often
- uninteresting logs. Set to -1 to disable (default 20MB).
-
- textfilepagekbs
-
- If set to other than -1, text files will be indexed as multiple
- documents of the given page size. This may be useful if you do
- want to index very big text files as it will both reduce memory
- usage at index time and help with loading data to the preview
- window. A size of a few megabytes would seem reasonable (default:
- 1MB).
-
- membermaxkbs
-
- This defines the maximum size in kilobytes for an archive member
- (zip, tar or rar at the moment). Bigger entries will be skipped.
-
- indexallfilenames
-
- Recoll indexes file names in a special section of the database to
- allow specific file names searches using wild cards. This
- parameter decides if file name indexing is performed only for
- files with mime types that would qualify them for full text
- indexing, or for all files inside the selected subtrees,
- independently of mime type.
-
- usesystemfilecommand
-
- Decide if we use the file -i system command as a final step for
- determining the mime type for a file (the main procedure uses
- suffix associations as defined in the mimemap file). This can be
- useful for files with suffix-less names, but it will also cause
- the indexing of many bogus "text" files.
-
- processbeaglequeue
-
- If this is set, process the directory where Beagle Web browser
- plugins copy visited pages for indexing. Of course, Beagle MUST
- NOT be running, else things will behave strangely.
-
- beaglequeuedir
-
- The path to the Beagle indexing queue. This is hard-coded in the
- Beagle plugin as ~/.beagle/ToIndex so there should be no need to
- change it.
-
- 5.4.1.2. Parameters affecting how we generate terms:
-
- Changing some of these parameters will imply a full reindex. Also, when
- using multiple indexes, it may not make sense to search indexes that don't
- share the values for these parameters, because they usually affect both
- search and index operations.
-
- indexStripChars
-
- Decide if we strip characters of diacritics and convert them to
- lower-case before terms are indexed. If we don't, searches
- sensitive to case and diacritics can be performed, but the index
- will be bigger, and some marginal weirdness may sometimes occur.
- The default is a stripped index (indexStripChars = 1) for now.
- When using multiple indexes for a search, this parameter must be
- defined identically for all. Changing the value implies an index
- reset.
-
- maxTermExpand
-
- Maximum expansion count for a single term (e.g.: when using
- wildcards). The default of 10000 is reasonable and will avoid
- queries that appear frozen while the engine is walking the term
- list.
-
- maxXapianClauses
-
- Maximum number of elementary clauses we can add to a single Xapian
- query. In some cases, the result of term expansion can be
- multiplicative, and we want to avoid using excessive memory. The
- default of 100 000 should be both high enough in most cases and
- compatible with current typical hardware configurations.
-
- nonumbers
-
- If this set to true, no terms will be generated for numbers. For
- example "123", "1.5e6", 192.168.1.4, would not be indexed
- ("value123" would still be). Numbers are often quite interesting
- to search for, and this should probably not be set except for
- special situations, ie, scientific documents with huge amounts of
- numbers in them. This can only be set for a whole index, not for a
- subtree.
-
- nocjk
-
- If this set to true, specific east asian (Chinese Korean Japanese)
- characters/word splitting is turned off. This will save a small
- amount of cpu if you have no CJK documents. If your document base
- does include such text but you are not interested in searching it,
- setting nocjk may be a significant time and space saver.
-
- cjkngramlen
-
- This lets you adjust the size of n-grams used for indexing CJK
- text. The default value of 2 is probably appropriate in most
- cases. A value of 3 would allow more precision and efficiency on
- longer words, but the index will be approximately twice as large.
-
- indexstemminglanguages
-
- A list of languages for which the stem expansion databases will be
- built. See recollindex(1) or use the recollindex -l command for
- possible values. You can add a stem expansion database for a
- different language by using recollindex -s, but it will be deleted
- during the next indexing. Only languages listed in the
- configuration file are permanent.
-
- defaultcharset
-
- The name of the character set used for files that do not contain a
- character set definition (ie: plain text files). This can be
- redefined for any sub-directory. If it is not set at all, the
- character set used is the one defined by the nls environment (
- LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
-
- unac_except_trans
-
- This is a list of characters, encoded in UTF-8, which should be
- handled specially when converting text to unaccented lowercase.
- For example, in Swedish, the letter a with diaeresis has full
- alphabet citizenship and should not be turned into an a. Each
- element in the space-separated list has the special character as
- first element and the translation following. The handling of both
- the lowercase and upper-case versions of a character should be
- specified, as appartenance to the list will turn-off both standard
- accent and case processing. Example for Swedish:
-
- unac_except_trans = aaaa AAaa a:a: A:a: o:o: O:o:
-
-
- Note that the translation is not limited to a single character,
- you could very well have something like u:ue in the list.
-
- The default value set for unac_except_trans can't be listed here
- because I have trouble with SGML and UTF-8, but it only contains
- ligature decompositions: german ss, oe, ae, fi, fl.
-
- This parameter can't be defined for subdirectories, it is global,
- because there is no way to do otherwise when querying. If you have
- document sets which would need different values, you will have to
- index and query them separately.
-
- maildefcharset
-
- This can be used to define the default character set specifically
- for email messages which don't specify it. This is mainly useful
- for readpst (libpst) dumps, which are utf-8 but do not say so.
-
- localfields
-
- This allows setting fields for all documents under a given
- directory. Typical usage would be to set an "rclaptg" field, to be
- used in mimeview to select a specific viewer. If several fields
- are to be set, they should be separated with a colon (':')
- character (which there is currently no way to escape). Ie:
- localfields= rclaptg=gnus:other = val, then select specifier
- viewer with mimetype|tag=... in mimeview.
-
- 5.4.1.3. Parameters affecting where and how we store things:
-
- dbdir
-
- The name of the Xapian data directory. It will be created if
- needed when the index is initialized. If this is not an absolute
- path, it will be interpreted relative to the configuration
- directory. The value can have embedded spaces but starting or
- trailing spaces will be trimmed. You cannot use quotes here.
-
- idxstatusfile
-
- The name of the scratch file where the indexer process updates its
- status. Default: idxstatus.txt inside the configuration directory.
-
- maxfsoccuppc
-
- Maximum file system occupation before we stop indexing. The value
- is a percentage, corresponding to what the "Capacity" df output
- column shows. The default value is 0, meaning no checking.
-
- mboxcachedir
-
- The directory where mbox message offsets cache files are held.
- This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
- to share a directory between different configurations.
-
- mboxcacheminmbs
-
- The minimum mbox file size over which we cache the offsets. There
- is really no sense in caching offsets for small files. The default
- is 5 MB.
-
- webcachedir
-
- This is only used by the Beagle web browser plugin indexing code,
- and defines where the cache for visited pages will live. Default:
- $RECOLL_CONFDIR/webcache
-
- webcachemaxmbs
-
- This is only used by the Beagle web browser plugin indexing code,
- and defines the maximum size for the web page cache. Default: 40
- MB.
-
- idxflushmb
-
- Threshold (megabytes of new text data) where we flush from memory
- to disk index. Setting this can help control memory usage. A value
- of 0 means no explicit flushing, letting Xapian use its own
- default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
- documents, which gives little memory usage control, as memory
- usage depends on average document size. The default value is 10.
-
- 5.4.1.4. Miscellaneous parameters:
-
- autodiacsens
-
- IF the index is not stripped, decide if we automatically trigger
- diacritics sensitivity if the search term has accented characters
- (not in unac_except_trans). Else you need to use the query
- language and the D modifier to specify diacritics sensitivity.
- Default is no.
-
- autocasesens
-
- IF the index is not stripped, decide if we automatically trigger
- character case sensitivity if the search term has upper-case
- characters in any but the first position. Else you need to use the
- query language and the C modifier to specify character-case
- sensitivity. Default is yes.
-
- loglevel,daemloglevel
-
- Verbosity level for recoll and recollindex. A value of 4 lists
- quite a lot of debug/information messages. 2 only lists errors.
- The daemversion is specific to the indexing monitor daemon.
-
- logfilename, daemlogfilename
-
- Where the messages should go. 'stderr' can be used as a special
- value, and is the default. The daemversion is specific to the
- indexing monitor daemon.
-
- mondelaypatterns
-
- This allows specify wildcard path patterns (processed with
- fnmatch(3) with 0 flag), to match files which change too often and
- for which a delay should be observed before re-indexing. This is a
- space-separated list, each entry being a pattern and a time in
- seconds, separated by a colon. You can use double quotes if a path
- entry contains white space. Example:
-
- mondelaypatterns = *.log:20 "this one has spaces*:10"
-
-
- monixinterval
-
- Minimum interval (seconds) for processing the indexing queue. The
- real time monitor does not process each event when it comes in,
- but will wait this time for the queue to accumulate to diminish
- overhead and in order to aggregate multiple events to the same
- file. Default 30 S.
-
- monauxinterval
-
- Period (in seconds) at which the real time monitor will regenerate
- the auxiliary databases (spelling, stemming) if needed. The
- default is one hour.
-
- monioniceclass, monioniceclassdata
-
- These allow defining the ionice class and data used by the indexer
- (default class 3, no data).
-
- filtermaxseconds
-
- Maximum filter execution time, after which it is aborted. Some
- postscript programs just loop...
-
- filtersdir
-
- A directory to search for the external filter scripts used to
- index some types of files. The value should not be changed, except
- if you want to modify one of the default scripts. The value can be
- redefined for any sub-directory.
-
- iconsdir
-
- The name of the directory where recoll result list icons are
- stored. You can change this if you want different images.
-
- idxabsmlen
-
- Recoll stores an abstract for each indexed file inside the
- database. The text can come from an actual 'abstract' section in
- the document or will just be the beginning of the document. It is
- stored in the index so that it can be displayed inside the result
- lists without decoding the original file. The idxabsmlen parameter
- defines the size of the stored abstract. The default value is 250
- bytes. The search interface gives you the choice to display this
- stored text or a synthetic abstract built by extracting text
- around the search terms. If you always prefer the synthetic
- abstract, you can reduce this value and save a little space.
-
- aspellLanguage
-
- Language definitions to use when creating the aspell dictionary.
- The value must match a set of aspell language definition files.
- You can type "aspell config" to see where these are installed
- (look for data-dir). The default if the variable is not set is to
- use your desktop national language environment to guess the value.
-
- noaspell
-
- If this is set, the aspell dictionary generation is turned off.
- Useful for cases where you don't need the functionality or when it
- is unusable because aspell crashes during dictionary generation.
-
- mhmboxquirks
-
- This allows definining location-related quirks for the mailbox
- handler. Currently only the tbird flag is defined, and it should
- be set for directories which hold Thunderbird data, as their
- folder format is weird.
-
-5.4.2. The fields file
-
- This file contains information about dynamic fields handling in Recoll.
- Some very basic fields have hard-wired behaviour, and, mostly, you should
- not change the original data inside the fields file. But you can create
- custom fields fitting your data and handle them just like they were native
- ones.
-
- The fields file has several sections, which each define an aspect of
- fields processing. Quite often, you'll have to modify several sections to
- obtain the desired behaviour.
-
- We will only give a short description here, you should refer to the
- comments inside the file for more detailed information.
-
- Field names should be lowercase alphabetic ASCII.
-
- [prefixes]
-
- A field becomes indexed (searchable) by having a prefix defined in
- this section.
-
- [stored]
-
- A field becomes stored (displayable inside results) by having its
- name listed in this section (typically with an empty value).
-
- [aliases]
-
- This section defines lists of synonyms for the canonical names
- used inside the [prefixes] and [stored] sections
-
- filter-specific sections
-
- Some filters may need specific configuration for handling fields.
- Only the email message filter currently has such a section (named
- [mail]). It allows indexing arbitrary email headers in addition to
- the ones indexed by default. Other such sections may appear in the
- future.
-
- Here follows a small example of a personal fields file. This would extract
- a specific email header and use it as a searchable field, with data
- displayable inside result lists. (Side note: as the email filter does no
- decoding on the values, only plain ascii headers can be indexed, and only
- the first occurrence will be used for headers that occur several times).
-
- [prefixes]
- # Index mailmytag contents (with the given prefix)
- mailmytag = XMTAG
-
- [stored]
- # Store mailmytag inside the document data record (so that it can be
- # displayed - as %(mailmytag) - in result lists).
- mailmytag =
-
- [mail]
- # Extract the X-My-Tag mail header, and use it internally with the
- # mailmytag field name
- x-my-tag = mailmytag
-
-5.4.3. The mimemap file
-
- mimemap specifies the file name extension to mime type mappings.
-
- For file names without an extension, or with an unknown one, the system's
- file -i command will be executed to determine the mime type (this can be
- switched off inside the main configuration file).
-
- The mappings can be specified on a per-subtree basis, which may be useful
- in some cases. Example: gaim logs have a .txt extension but should be
- handled specially, which is possible because they are usually all located
- in one place.
-
- mimemap also has a recoll_noindex variable which is a list of suffixes.
- Matching files will be skipped (which avoids unnecessary decompressions or
- file executions). This is partially redundant with skippedNames in the
- main configuration file, with a few differences: it will not affect
- directories, it cannot be made dependant on the file-system location (it
- is a configuration-wide parameter), and the file names will still be
- indexed (not even the file names are indexed for patterns in skippedNames.
- recoll_noindex is used mostly for things known to be unindexable by a
- given Recoll version. Having it there avoids cluttering the more
- user-oriented and locally customized skippedNames.
-
-5.4.4. The mimeconf file
-
- mimeconf specifies how the different mime types are handled for indexing,
- and which icons are displayed in the recoll result lists.
-
- Changing the parameters in the [index] section is probably not a good idea
- except if you are a Recoll developer.
-
- The [icons] section allows you to change the icons which are displayed by
- recoll in the result lists (the values are the basenames of the png images
- inside the iconsdir directory (specified in recoll.conf).
-
-5.4.5. The mimeview file
-
- mimeview specifies which programs are started when you click on an Open
- link in a result list. Ie: HTML is normally displayed using firefox, but
- you may prefer Konqueror, your openoffice.org program might be named
- oofice instead of openoffice etc.
-
- Changes to this file can be done by direct editing, or through the recoll
- GUI preferences dialog.
-
- If Use desktop preferences to choose document editor is checked in the
- Recoll GUI preferences, all mimeview entries will be ignored except the
- one labelled application/x-all (which is set to use xdg-open by default).
-
- In this case, the xallexcepts top level variable defines a list of mime
- type exceptions which will be processed according to the local entries
- instead of being passed to the desktop. This is so that specific Recoll
- options such as a page number or a search string can be passed to
- applications that support them, such as the evince viewer.
-
- As for the other configuration files, the normal usage is to have a
- mimeview inside your own configuration directory, with just the
- non-default entries, which will override those from the central
- configuration file.
-
- All viewer definition entries must be placed under a [view] section.
-
- The keys in the file are normally mime types. You can add an application
- tag to specialize the choice for an area of the filesystem (using a
- localfields specification in mimeconf). The syntax for the key is
- mimetype|tag
-
- The nouncompforviewmts entry, (placed at the top level, outside of the
- [view] section), holds a list of mime types that should not be
- uncompressed before starting the viewer (if they are found compressed, ie:
- mydoc.doc.gz).
-
- The right side of each assignment holds a command to be executed for
- opening the file. The following substitutions are performed:
-
- * %D. Document date
-
- * %f. File name. This may be the name of a temporary file if it was
- necessary to create one (ie: to extract a subdocument from a
- container).
-
- * %F. Original file name. Same as %f except if a temporary file is used.
-
- * %i. Internal path, for subdocuments of containers. The format depends
- on the container type. If this appears in the command line, Recoll
- will not create a temporary file to extract the subdocument, expecting
- the called application (possibly a script) to be able to handle it.
-
- * %M. Mime type
-
- * %p. Page index. Only significant for a subset of document types,
- currently only PDF, Postscript and DVI files. Can be used to start the
- editor at the right page for a match or snippet.
-
- * %s. Search term. The value will only be set for documents with indexed
- page numbers (ie: PDF). The value will be one of the matched search
- terms. It would allow pre-setting the value in the "Find" entry inside
- Evince for example, for easy highlighting of the term.
-
- * %U, %u. Url.
-
- In addition to the predefined values above, all strings like %(fieldname)
- will be replaced by the value of the field named fieldname for the
- document. This could be used in combination with field customisation to
- help with opening the document.
-
-5.4.6. Examples of configuration adjustments
-
- 5.4.6.1. Adding an external viewer for an non-indexed type
-
- Imagine that you have some kind of file which does not have indexable
- content, but for which you would like to have a functional Open link in
- the result list (when found by file name). The file names end in .blob and
- can be displayed by application blobviewer.
-
- You need two entries in the configuration files for this to work:
-
- * In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
- following line:
-
- .blob = application/x-blobapp
-
- Note that the mime type is made up here, and you could call it
- diesel/oil just the same.
- * In $RECOLL_CONFDIR/mimeview under the [view] section, add:
-
- application/x-blobapp = blobviewer %f
-
- We are supposing that blobviewer wants a file name parameter here, you
- would use %u if it liked URLs better.
-
- If you just wanted to change the application used by Recoll to display a
- mime type which it already knows, you would just need to edit mimeview.
- The entries you add in your personal file override those in the central
- configuration, which you do not need to alter. mimeview can also be
- modified from the Gui.
-
- 5.4.6.2. Adding indexing support for a new file type
-
- Let us now imagine that the above .blob files actually contain indexable
- text and that you know how to extract it with a command line program.
- Getting Recoll to index the files is easy. You need to perform the above
- alteration, and also to add data to the mimeconf file (typically in
- ~/.recoll/mimeconf):
-
- * Under the [index] section, add the following line (more about the
- rclblob indexing script later):
-
- application/x-blobapp = exec rclblob
-
- * Under the [icons] section, you should choose an icon to be displayed
- for the files inside the result lists. Icons are normally 64x64 pixels
- PNG files which live in /usr/[local/]share/recoll/images.
-
- * Under the [categories] section, you should add the mime type where it
- makes sense (you can also create a category). Categories may be used
- for filtering in advanced search.
-
- The rclblob filter should be an executable program or script which exists
- inside /usr/[local/]share/recoll/filters. It will be given a file name as
- argument and should output the text or html contents on the standard
- output.
-
- The filter programming section describes in more detail how to write a
- filter.
-
- --------------------------------------------------------------------------
-
- Prev Home
- Building from source Up