recoll / Code / [4cac0f] /src/doc/man/recoll.conf.5

[4cac0f]: src / doc / man / recoll.conf.5 History

recoll.conf.5 217 lines (211 with data), 8.4 kB

.\" $Id: recoll.conf.5,v 1.5 2007-07-13 10:18:49 dockes Exp $ (C) 2005 J.F.Dockes\$
.TH RECOLL.CONF 5 "8 January 2006"
.SH NAME
recoll.conf \- main personal configuration file for Recoll
.SH DESCRIPTION
This file defines the indexation configuration for the Recoll full-text search
system.
.LP
The system-wide configuration file is normally located inside
/usr/[local]/share/recoll/examples. Any parameter set in the common file
may be overriden by setting it in the personal configuration file, by default:
.IR $HOME/.recoll/recoll.conf
.LP
Please note while we try to keep this manual page reasonably up to date, it
will frequently lag the current state of the software. The best source of
information about the configuration are the comments in the configuration
file.

.LP
A short extract of the file might look as follows:
.IP
.nf

# Space-separated list of directories to index.
topdirs =  ~/docs /usr/share/doc

[~/somedirectory-with-utf8-txt-files]
defaultcharset = utf-8

.fi
.LP
There are three kinds of lines: 
.RS
.IP \(bu
Comment or empty
.IP \(bu
Parameter affectation
.IP \(bu
Section definition
.RE
.LP
Empty lines or lines beginning with # are ignored.
.LP
Affectation lines are in the form 'name = value'.
.LP
Section lines allow redefining a parameter for a directory subtree. Some of
the parameters used for indexaction are looked up hierarchically from the
more to the less specific. Not all parameters can be meaningfully
redefined, this is specified for each in the next section.
.LP
The tilde character (~) is expanded in file names to the name of the user's
home directory.
.LP
Where values are lists, white space is used for separation, and elements with
embedded spaces can be quoted with double-quotes.
.SH OPTIONS
.TP
.BI "topdirs = "  directories
Specifies the list of directories to index (recursively). 
.TP
.BI "dbdir = " directory
The name of the Xapian database directory. It will be created if needed
when the database is initialized. If this is not an absolute pathname, it
will be taken relative to the configuration directory.
.TP
.BI "skippedNames = " patterns
A space-separated list of patterns for names of files or directories that
should be completely ignored. The list defined in the default file is:
.sp
.nf
*~ #* bin CVS  Cache caughtspam  tmp

.fi
The list can be redefined for subdirectories, but is only actually changed
for the top level ones in 
.I topdirs
.TP
.BI "skippedPaths = " patterns
A space-separated list of patterns for paths the indexer should not descend
into. Together with topdirs, this allows pruning the indexed tree to one's
content. daemSkippedPaths can be used to define a specific value for the
real time indexing monitor.
.TP
.BI "followLinks = " boolean
Specifies if the indexer should follow
symbolic links while walking the file tree. The default is
to ignore symbolic links to avoid multiple indexing of
linked files. No effort is made to avoid duplication when
this option is set to true. This option can be set
individually for each of the 
.I topdirs
members by using sections. It can not be changed below the
.I topdirs
level.
.TP
.BI "loglevel = " value
Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
debug/information messages. 3 lists only errors. 
.B daemloglevel
can be used to specify a different value for the real-time indexing daemon.
.TP
.BI "logfilename = " file
Where should the messages go. 'stderr' can be used as a special value.
.B daemlogfilename
can be used to specify a different value for the real-time indexing daemon.
.TP
.BI "indexstemminglanguages = " languages
A list of languages for which the stem expansion databases will be
built. See recollindex(1) for possible values.
.TP
.BI "defaultcharset = " charset
The name of the character set used for files that do not contain a
character set definition (ie: plain text files). This can be redefined for
any subdirectory.
.TP
.BI "maxfsoccuppc = " percentnumber
Maximum file system occupation before we
stop indexing. The value is a percentage, corresponding to
what the "Capacity" df output column shows.  The default
value is 0, meaning no checking.
.TP
.BI "idxflushmb = " megabytes
Threshold (megabytes of new text data)
where we flush from memory to disk index. Setting this can
help control memory usage. A value of 0 means no explicit
flushing, letting Xapian use its own default, which is
flushing every 10000 documents (memory usage depends on
average document size). The default value is 10.
.TP
.BI "filtersdir = " directory
A directory to search for the external filter scripts used to index some
types of files. The value should not be changed, except if you want to
modify one of the default scripts. The value can be redefined for any
subdirectory. 
.TP
.BI "iconsdir = " directory
The name of the directory where 
.B recoll
result list icons are stored. You can change this if you want different
images.
.TP
.BI "guesscharset = " boolean
Try to guess the character set of files if no internal value is available
(ie: for plain text files). This does not work well in general, and should
probably not be used.
.TP
.BI "usesystemfilecommand = " boolean
Decide if we use the 
.B "file -i"
system command as a final step for determining the mime type for a file
(the main procedure uses suffix associations as defined in the 
.B mimemap 
file). This can be useful for files with suffixless names, but it will
also cause the indexation of many bogus "text" files.
.TP
.BI "indexedmimetypes = " list
Recoll normally indexes any file which it knows how to read. This list lets
you restrict the indexed mime types to what you specify. If the variable is
unspecified or the list empty (the default), all supported types are
processed.
.TP
.BI "compressedfilemaxkbs = " value
Size limit for compressed (.gz or .bz2) files. These need to be
decompressed in a temporary directory for identification, which can be very
wasteful if 'uninteresting' big compressed files are present.  Negative
means no limit, 0 means no processing of any compressed file. Defaults 
to \-1.
.TP
.BI "indexallfilenames = " boolean
Recoll indexes file names into a special section of the database to allow
specific file names searches using wild cards. This parameter decides if
file name indexing is performed only for files with mime types that would
qualify them for full text indexation, or for all files inside
the selected subtrees, independant of mime type.
.TP
.BI "idxabsmlen = " value
Recoll stores an abstract for each indexed file inside the database. The
text can come from an actual 'abstract' section in the document or will
just be the beginning of the document. It is stored in the index so that it
can be displayed inside the result lists without decoding the original
file. The
.I idxabsmlen
parameter defines the size of the stored abstract. The default value is 250
bytes.  The search interface gives you the choice to display this stored
text or a synthetic abstract built by extracting text around the search
terms. If you always prefer the synthetic abstract, you can reduce this
value and save a little space.
.TP
.BI "aspellLanguage = " lang
Language definitions to use when creating the aspell dictionary.  The value
must match a set of aspell language definition files. You can type "aspell
config" to see where these are installed (look for data-dir). The default
if the variable is not set is to use your desktop national language
environment to guess the value.
.TP
.BI "noaspell = " boolean
If this is set, the aspell dictionary generation is turned off. Useful for
cases where you don't need the functionality or when it is unusable because
aspell crashes during dictionary generation.
.TP
.BI "nocjk = " boolean
If this set to true, specific east asian (Chinese Korean Japanese)
characters/word splitting is turned off. This will save a small amount of
cpu if you have no CJK documents. If your document base does include such
text but you are not interested in searching it, setting
.I nocjk
may be a significant time and space saver.
.TP
.BI "cjkngramlen = " value
This lets you adjust the size of n-grams used for indexing CJK text. The
default value of 2 is probably appropriate in most cases. A value of 3
would allow more precision and efficiency on longer words, but the index
will be approximately twice as large.
.SH SEE ALSO
.PP 
recollindex(1) recoll(1)