Switch to unified view

a/src/doc/man/recoll.conf.5 b/src/doc/man/recoll.conf.5
1
.\" $Id: recoll.conf.5,v 1.5 2007-07-13 10:18:49 dockes Exp $ (C) 2005 J.F.Dockes\$
1
.\" $Id: recoll.conf.5,v 1.5 2007-07-13 10:18:49 dockes Exp $ (C) 2005 J.F.Dockes\$
2
.TH RECOLL.CONF 5 "8 January 2006"
2
.TH RECOLL.CONF 5 "8 January 2006"
3
.SH NAME
3
.SH NAME
4
recoll.conf \- main personal configuration file for Recoll
4
recoll.conf \- main personal configuration file for Recoll
5
.SH DESCRIPTION
5
.SH DESCRIPTION
6
This file defines the indexation configuration for the Recoll full-text search
6
This file defines the index configuration for the Recoll full-text search
7
system.
7
system.
8
.LP
8
.LP
9
The system-wide configuration file is normally located inside
9
The system-wide configuration file is normally located inside
10
/usr/[local]/share/recoll/examples. Any parameter set in the common file
10
/usr/[local]/share/recoll/examples. Any parameter set in the common file
11
may be overridden by setting it in the personal configuration file, by default:
11
may be overridden by setting it in the personal configuration file, by default:
12
.IR $HOME/.recoll/recoll.conf
12
.IR $HOME/.recoll/recoll.conf
13
.LP
13
.LP
14
Please note while we try to keep this manual page reasonably up to date, it
14
Please note while we try to keep this manual page reasonably up to date, it
15
will frequently lag the current state of the software. The best source of
15
will frequently lag the current state of the software. The best source of
16
information about the configuration are the comments in the configuration
16
information about the configuration are the comments in the system-wide
17
file.
17
configuration file.
18
18
19
.LP
19
.LP
20
A short extract of the file might look as follows:
20
A short extract of the file might look as follows:
21
.IP
21
.IP
22
.nf
22
.nf
...
...
42
Empty lines or lines beginning with # are ignored.
42
Empty lines or lines beginning with # are ignored.
43
.LP
43
.LP
44
Affectation lines are in the form 'name = value'.
44
Affectation lines are in the form 'name = value'.
45
.LP
45
.LP
46
Section lines allow redefining a parameter for a directory subtree. Some of
46
Section lines allow redefining a parameter for a directory subtree. Some of
47
the parameters used for indexaction are looked up hierarchically from the
47
the parameters used for indexing are looked up hierarchically from the
48
more to the less specific. Not all parameters can be meaningfully
48
more to the less specific. Not all parameters can be meaningfully
49
redefined, this is specified for each in the next section.
49
redefined, this is specified for each in the next section.
50
.LP
50
.LP
51
The tilde character (~) is expanded in file names to the name of the user's
51
The tilde character (~) is expanded in file names to the name of the user's
52
home directory.
52
home directory.
...
...
55
embedded spaces can be quoted with double-quotes.
55
embedded spaces can be quoted with double-quotes.
56
.SH OPTIONS
56
.SH OPTIONS
57
.TP
57
.TP
58
.BI "topdirs = "  directories
58
.BI "topdirs = "  directories
59
Specifies the list of directories to index (recursively). 
59
Specifies the list of directories to index (recursively). 
60
.TP
61
.BI "dbdir = " directory
62
The name of the Xapian database directory. It will be created if needed
63
when the database is initialized. If this is not an absolute pathname, it
64
will be taken relative to the configuration directory.
65
.TP
60
.TP
66
.BI "skippedNames = " patterns
61
.BI "skippedNames = " patterns
67
A space-separated list of patterns for names of files or directories that
62
A space-separated list of patterns for names of files or directories that
68
should be completely ignored. The list defined in the default file is:
63
should be completely ignored. The list defined in the default file is:
69
.sp
64
.sp
...
...
76
.I topdirs
71
.I topdirs
77
.TP
72
.TP
78
.BI "skippedPaths = " patterns
73
.BI "skippedPaths = " patterns
79
A space-separated list of patterns for paths the indexer should not descend
74
A space-separated list of patterns for paths the indexer should not descend
80
into. Together with topdirs, this allows pruning the indexed tree to one's
75
into. Together with topdirs, this allows pruning the indexed tree to one's
81
content. daemSkippedPaths can be used to define a specific value for the
76
content.
82
real time indexing monitor.
77
.B daemSkippedPaths 
78
can be used to define a specific value for the real time indexing monitor.
79
.TP
80
.BI "skippedPathsFnmPathname = " 0/1
81
The values in the *skippedPaths variables are matched by default with
82
fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags. This means
83
that '/' characters must be matched explicitely. You can set
84
skippedPathsFnmPathname to 0 to disable the use of FNM_PATHNAME (meaning
85
that /*/dir3 will match /dir1/dir2/dir3). 
83
.TP
86
.TP
84
.BI "followLinks = " boolean
87
.BI "followLinks = " boolean
85
Specifies if the indexer should follow
88
Specifies if the indexer should follow
86
symbolic links while walking the file tree. The default is
89
symbolic links while walking the file tree. The default is
87
to ignore symbolic links to avoid multiple indexing of
90
to ignore symbolic links to avoid multiple indexing of
...
...
91
.I topdirs
94
.I topdirs
92
members by using sections. It can not be changed below the
95
members by using sections. It can not be changed below the
93
.I topdirs
96
.I topdirs
94
level.
97
level.
95
.TP
98
.TP
96
.BI "loglevel = " value
99
.BI "indexedmimetypes = " list
97
Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
100
Recoll normally indexes any file which it knows how to read. This list lets
98
debug/information messages. 3 lists only errors. 
101
you restrict the indexed mime types to what you specify. If the variable is
99
.B daemloglevel
102
unspecified or the list empty (the default), all supported types are
100
can be used to specify a different value for the real-time indexing daemon.
103
processed.
104
.TP
105
.BI "compressedfilemaxkbs = " value
106
Size limit for compressed (.gz or .bz2) files. These need to be
107
decompressed in a temporary directory for identification, which can be very
108
wasteful if 'uninteresting' big compressed files are present.  Negative
109
means no limit, 0 means no processing of any compressed file. Defaults 
110
to \-1.
111
.TP
112
.BI "textfilemaxmbs = " value
113
Maximum size for text files. Very big text files are often uninteresting
114
logs. Set to -1 to disable (default 20MB). 
115
.TP
116
.BI "textfilepagekbs = " value
117
If this is set to other than -1, text files will be indexed as multiple
118
documents of the given page size. This may be useful if you do want to
119
index very big text files as it will both reduce memory usage at index time
120
and help with loading data to the preview window. A size of a few megabytes
121
would seem reasonable (default: 1000 : 1MB).
122
.TP
123
.BI "membermaxkbs = " "value in kilobytes"
124
This defines the maximum size for an archive member (zip, tar or rar at
125
the moment). Bigger entries will be skipped. Current default: 50000 (50 MB).
126
.TP
127
.BI "indexallfilenames = " boolean
128
Recoll indexes file names into a special section of the database to allow
129
specific file names searches using wild cards. This parameter decides if
130
file name indexing is performed only for files with mime types that would
131
qualify them for full text indexing, or for all files inside
132
the selected subtrees, independent of mime type.
133
.TP
134
.BI "usesystemfilecommand = " boolean
135
Decide if we use the 
136
.B "file \-i"
137
system command as a final step for determining the mime type for a file
138
(the main procedure uses suffix associations as defined in the 
139
.B mimemap 
140
file). This can be useful for files with suffixless names, but it will
141
also cause the indexing of many bogus "text" files.
101
.TP
142
.TP 
102
.BI "logfilename = " file
143
.BI "processbeaglequeue = " 0/1
103
Where should the messages go. 'stderr' can be used as a special value.
144
If this is set, process the directory where Beagle Web browser plugins copy
104
.B daemlogfilename
145
visited pages for indexing. Of course, Beagle MUST NOT be running, else
105
can be used to specify a different value for the real-time indexing daemon.
146
things will behave strangely. 
147
.TP 
148
.BI "beaglequeuedir = " directory path
149
The path to the Beagle indexing queue. This is hard-coded in the Beagle
150
plugin as ~/.beagle/ToIndex so there should be no need to change it. 
151
.TP 
152
.BI "indexStripChars = " 0/1
153
Decide if we strip characters of diacritics and convert them to lower-case
154
before terms are indexed. If we don't, searches sensitive to case and
155
diacritics can be performed, but the index will be bigger, and some
156
marginal weirdness may sometimes occur. The default is a stripped index
157
(indexStripChars = 1) for now. When using multiple indexes for a search,
158
this parameter must be defined identically for all. Changing the value
159
implies an index reset.
160
.TP 
161
.BI "maxTermExpand = " value
162
Maximum expansion count for a single term (e.g.: when using wildcards). The
163
default of 10000 is reasonable and will avoid queries that appear frozen
164
while the engine is walking the term list. 
165
.TP 
166
.BI "maxXapianClauses = " value
167
Maximum number of elementary clauses we can add to a single Xapian
168
query. In some cases, the result of term expansion can be multiplicative,
169
and we want to avoid using excessive memory. The default of 100 000 should
170
be both high enough in most cases and compatible with current typical
171
hardware configurations. 
172
.TP 
173
.BI "nonumbers = " 0/1
174
If this set to true, no terms will be generated for numbers. For example
175
"123", "1.5e6", 192.168.1.4, would not be indexed ("value123" would still
176
be). Numbers are often quite interesting to search for, and this should
177
probably not be set except for special situations, ie, scientific documents
178
with huge amounts of numbers in them. This can only be set for a whole
179
index, not for a subtree. 
180
.TP
181
.BI "nocjk = " boolean
182
If this set to true, specific east asian (Chinese Korean Japanese)
183
characters/word splitting is turned off. This will save a small amount of
184
cpu if you have no CJK documents. If your document base does include such
185
text but you are not interested in searching it, setting
186
.I nocjk
187
may be a significant time and space saver.
188
.TP
189
.BI "cjkngramlen = " value
190
This lets you adjust the size of n-grams used for indexing CJK text. The
191
default value of 2 is probably appropriate in most cases. A value of 3
192
would allow more precision and efficiency on longer words, but the index
193
will be approximately twice as large.
106
.TP
194
.TP
107
.BI "indexstemminglanguages = " languages
195
.BI "indexstemminglanguages = " languages
108
A list of languages for which the stem expansion databases will be
196
A list of languages for which the stem expansion databases will be
109
built. See recollindex(1) for possible values.
197
built. See recollindex(1) for possible values.
110
.TP
198
.TP
111
.BI "defaultcharset = " charset
199
.BI "defaultcharset = " charset
112
The name of the character set used for files that do not contain a
200
The name of the character set used for files that do not contain a
113
character set definition (ie: plain text files). This can be redefined for
201
character set definition (ie: plain text files). This can be redefined for
114
any subdirectory.
202
any subdirectory.
203
.TP 
204
.BI "unac_except_trans = " "list of utf-8 groups"
205
This is a list of characters, encoded in UTF-8, which should be handled
206
specially when converting text to unaccented lowercase. For example, in
207
Swedish, the letter "a with diaeresis" has full alphabet citizenship and
208
should not be turned into an a. 
209
.br
210
Each element in the space-separated list has the special character as first
211
element and the translation following. The handling of both the lowercase
212
and upper-case versions of a character should be specified, as appartenance
213
to the list will turn-off both standard accent and case processing.
214
.br
215
Note that the translation is not limited to a single character.
216
.br
217
This parameter cannot be redefined for subdirectories, it is global,
218
because there is no way to do otherwise when querying. If you have document
219
sets which would need different values, you will have to index and query
220
them separately.
221
.TP
222
.BI "maildefcharset = " character set name
223
This can be used to define the default character set specifically for email
224
messages which don't specify it. This is mainly useful for readpst (libpst)
225
dumps, which are utf-8 but do not say so. 
226
.TP
227
.BI "localfields = " "fieldname = value:..."
228
This allows setting fields for all documents under a given
229
directory. Typical usage would be to set an "rclaptg" field, to be used in
230
mimeview to select a specific viewer. If several fields are to be set, they
231
should be separated with a colon (':') character (which there is currently
232
no way to escape). Ie: localfields= rclaptg=gnus:other = val, then select
233
specifier viewer with mimetype|tag=... in mimeview. 
234
.TP
235
.BI "dbdir = " directory
236
The name of the Xapian database directory. It will be created if needed
237
when the database is initialized. If this is not an absolute pathname, it
238
will be taken relative to the configuration directory.
239
.TP
240
.BI "idxstatusfile = " "file path"
241
The name of the scratch file where the indexer process updates its
242
status. Default: idxstatus.txt inside the configuration directory. 
115
.TP
243
.TP
116
.BI "maxfsoccuppc = " percentnumber
244
.BI "maxfsoccuppc = " percentnumber
117
Maximum file system occupation before we
245
Maximum file system occupation before we
118
stop indexing. The value is a percentage, corresponding to
246
stop indexing. The value is a percentage, corresponding to
119
what the "Capacity" df output column shows.  The default
247
what the "Capacity" df output column shows.  The default
120
value is 0, meaning no checking.
248
value is 0, meaning no checking.
249
.TP
250
.BI "mboxcachedir = " "directory path"
251
The directory where mbox message offsets cache files are held. This is
252
normally $RECOLL_CONFDIR/mboxcache, but it may be useful to share a
253
directory between different configurations. 
254
.TP
255
.BI "mboxcacheminmbs = " "value in megabytes"
256
The minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The default is 5 MB.
257
.TP
258
.BI "webcachedir = " "directory path"
259
This is only used by the Beagle web browser plugin indexing code, and
260
defines where the cache for visited pages will live. Default:
261
$RECOLL_CONFDIR/webcache
262
.TP
263
.BI "webcachemaxmbs = " "value in megabytes"
264
This is only used by the Beagle web browser plugin indexing code, and
265
defines the maximum size for the web page cache. Default: 40 MB. 
121
.TP
266
.TP
122
.BI "idxflushmb = " megabytes
267
.BI "idxflushmb = " megabytes
123
Threshold (megabytes of new text data)
268
Threshold (megabytes of new text data)
124
where we flush from memory to disk index. Setting this can
269
where we flush from memory to disk index. Setting this can
125
help control memory usage. A value of 0 means no explicit
270
help control memory usage. A value of 0 means no explicit
126
flushing, letting Xapian use its own default, which is
271
flushing, letting Xapian use its own default, which is
127
flushing every 10000 documents (or XAPIAN_FLUSH_THRESHOLD), meaning that
272
flushing every 10000 documents (or XAPIAN_FLUSH_THRESHOLD), meaning that
128
memory usage depends on average document size. The default value is 10.
273
memory usage depends on average document size. The default value is 10.
129
.TP
274
.TP
275
.BI "autodiacsens = " 0/1
276
IF the index is not stripped, decide if we automatically trigger diacritics
277
sensitivity if the search term has accented characters (not in
278
unac_except_trans). Else you need to use the query language and the D
279
modifier to specify diacritics sensitivity. Default is no. 
280
.TP
281
.BI "autocasesens = " 0/1
282
IF the index is not stripped, decide if we automatically trigger character
283
case sensitivity if the search term has upper-case characters in any but
284
the first position. Else you need to use the query language and the C
285
modifier to specify character-case sensitivity. Default is yes. 
286
.TP
287
.BI "loglevel = " value
288
Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
289
debug/information messages. 3 lists only errors. 
290
.B daemloglevel
291
can be used to specify a different value for the real-time indexing daemon.
292
.TP
293
.BI "logfilename = " file
294
Where should the messages go. 'stderr' can be used as a special value.
295
.B daemlogfilename
296
can be used to specify a different value for the real-time indexing daemon.
297
.TP
298
.BI "mondelaypatterns = " "list of patterns"
299
This allows specify wildcard path patterns (processed with fnmatch(3) with
300
0 flag), to match files which change too often and for which a delay should
301
be observed before re-indexing. This is a space-separated list, each entry
302
being a pattern and a time in seconds, separated by a colon. You can use
303
double quotes if a path entry contains white space. Example: 
304
.sp
305
mondelaypatterns = *.log:20 "this one has spaces*:10"
306
.TP                  
307
.BI "monixinterval = " "value in seconds
308
Minimum interval (seconds) for processing the indexing queue. The real time
309
monitor does not process each event when it comes in, but will wait this
310
time for the queue to accumulate to diminish overhead and in order to
311
aggregate multiple events to the same file. Default 30 S. 
312
.TP
313
.BI "monauxinterval = " "value in seconds
314
Period (in seconds) at which the real time monitor will regenerate the
315
auxiliary databases (spelling, stemming) if needed. The default is one
316
hour. 
317
.TP
318
.BI "monioniceclass, monioniceclassdata"
319
These allow defining the ionice class and data used by the indexer (default
320
class 3, no data). 
321
.TP
322
.BI "filtermaxseconds = " "value in seconds"
323
Maximum filter execution time, after which it is aborted. Some postscript
324
programs just loop... 
325
.TP
130
.BI "filtersdir = " directory
326
.BI "filtersdir = " directory
131
A directory to search for the external filter scripts used to index some
327
A directory to search for the external filter scripts used to index some
132
types of files. The value should not be changed, except if you want to
328
types of files. The value should not be changed, except if you want to
133
modify one of the default scripts. The value can be redefined for any
329
modify one of the default scripts. The value can be redefined for any
134
subdirectory. 
330
subdirectory. 
...
...
136
.BI "iconsdir = " directory
332
.BI "iconsdir = " directory
137
The name of the directory where 
333
The name of the directory where 
138
.B recoll
334
.B recoll
139
result list icons are stored. You can change this if you want different
335
result list icons are stored. You can change this if you want different
140
images.
336
images.
141
.TP
142
.BI "guesscharset = " boolean
143
Try to guess the character set of files if no internal value is available
144
(ie: for plain text files). This does not work well in general, and should
145
probably not be used.
146
.TP
147
.BI "usesystemfilecommand = " boolean
148
Decide if we use the 
149
.B "file \-i"
150
system command as a final step for determining the mime type for a file
151
(the main procedure uses suffix associations as defined in the 
152
.B mimemap 
153
file). This can be useful for files with suffixless names, but it will
154
also cause the indexation of many bogus "text" files.
155
.TP
156
.BI "indexedmimetypes = " list
157
Recoll normally indexes any file which it knows how to read. This list lets
158
you restrict the indexed mime types to what you specify. If the variable is
159
unspecified or the list empty (the default), all supported types are
160
processed.
161
.TP
162
.BI "compressedfilemaxkbs = " value
163
Size limit for compressed (.gz or .bz2) files. These need to be
164
decompressed in a temporary directory for identification, which can be very
165
wasteful if 'uninteresting' big compressed files are present.  Negative
166
means no limit, 0 means no processing of any compressed file. Defaults 
167
to \-1.
168
.TP
169
.BI "indexallfilenames = " boolean
170
Recoll indexes file names into a special section of the database to allow
171
specific file names searches using wild cards. This parameter decides if
172
file name indexing is performed only for files with mime types that would
173
qualify them for full text indexation, or for all files inside
174
the selected subtrees, independent of mime type.
175
.TP
337
.TP
176
.BI "idxabsmlen = " value
338
.BI "idxabsmlen = " value
177
Recoll stores an abstract for each indexed file inside the database. The
339
Recoll stores an abstract for each indexed file inside the database. The
178
text can come from an actual 'abstract' section in the document or will
340
text can come from an actual 'abstract' section in the document or will
179
just be the beginning of the document. It is stored in the index so that it
341
just be the beginning of the document. It is stored in the index so that it
...
...
196
.BI "noaspell = " boolean
358
.BI "noaspell = " boolean
197
If this is set, the aspell dictionary generation is turned off. Useful for
359
If this is set, the aspell dictionary generation is turned off. Useful for
198
cases where you don't need the functionality or when it is unusable because
360
cases where you don't need the functionality or when it is unusable because
199
aspell crashes during dictionary generation.
361
aspell crashes during dictionary generation.
200
.TP
362
.TP
201
.BI "nocjk = " boolean
363
.BI "mhmboxquirks = " flags
202
If this set to true, specific east asian (Chinese Korean Japanese)
364
This allows definining location-related quirks for the mailbox
203
characters/word splitting is turned off. This will save a small amount of
365
handler. Currently only the tbird flag is defined, and it should be set for
204
cpu if you have no CJK documents. If your document base does include such
366
directories which hold Thunderbird data, as their folder format is weird. 
205
text but you are not interested in searching it, setting
367
206
.I nocjk
207
may be a significant time and space saver.
208
.TP
209
.BI "cjkngramlen = " value
210
This lets you adjust the size of n-grams used for indexing CJK text. The
211
default value of 2 is probably appropriate in most cases. A value of 3
212
would allow more precision and efficiency on longer words, but the index
213
will be approximately twice as large.
214
.SH SEE ALSO
368
.SH SEE ALSO
215
.PP 
369
.PP 
216
recollindex(1) recoll(1)
370
recollindex(1) recoll(1)