|
a/src/INSTALL |
|
b/src/INSTALL |
|
... |
|
... |
96 |
|
96 |
|
97 |
* MP3: Recoll will use the id3info command from the id3lib package to
|
97 |
* MP3: Recoll will use the id3info command from the id3lib package to
|
98 |
extract tag information. Without it, only the file names will be
|
98 |
extract tag information. Without it, only the file names will be
|
99 |
indexed.
|
99 |
indexed.
|
100 |
|
100 |
|
101 |
Text, HTML, mail folders and Openoffice files are processed internally.
|
101 |
Text, HTML, mail folders Openoffice and Scribus files are processed
|
|
|
102 |
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
102 |
|
103 |
|
103 |
--------------------------------------------------------------------------
|
104 |
--------------------------------------------------------------------------
|
104 |
|
105 |
|
105 |
Prev Home Next
|
106 |
Prev Home Next
|
106 |
Installation Up Building from source
|
107 |
Installation Up Building from source
|
|
... |
|
... |
215 |
recoll and recollindex.
|
216 |
recoll and recollindex.
|
216 |
|
217 |
|
217 |
If the .recoll directory does not exist when recoll or recollindex are
|
218 |
If the .recoll directory does not exist when recoll or recollindex are
|
218 |
started, it will be created with a set of empty configuration files.
|
219 |
started, it will be created with a set of empty configuration files.
|
219 |
recoll will give you a chance to edit the configuration file before
|
220 |
recoll will give you a chance to edit the configuration file before
|
220 |
starting indexing. recollindex will proceed immediately.
|
221 |
starting indexing. recollindex will proceed immediately. To avoid
|
|
|
222 |
mistakes, the automatic directory creation will only occur for the default
|
|
|
223 |
location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
|
|
|
224 |
will have to create the directory).
|
221 |
|
225 |
|
222 |
All configuration files share the same format. For example, a short
|
226 |
All configuration files share the same format. For example, a short
|
223 |
extract of the main configuration file might look as follows:
|
227 |
extract of the main configuration file might look as follows:
|
224 |
|
228 |
|
225 |
# Space-separated list of directories to index.
|
229 |
# Space-separated list of directories to index.
|
|
... |
|
... |
245 |
in the next section.
|
249 |
in the next section.
|
246 |
|
250 |
|
247 |
The tilde character (~) is expanded in file names to the name of the
|
251 |
The tilde character (~) is expanded in file names to the name of the
|
248 |
user's home directory.
|
252 |
user's home directory.
|
249 |
|
253 |
|
250 |
White space is used for separation inside lists. Elements with embedded
|
254 |
White space is used for separation inside lists. List elements with
|
251 |
spaces can be quoted using double-quotes.
|
255 |
embedded spaces can be quoted using double-quotes.
|
252 |
|
256 |
|
253 |
4.4.1. Main configuration file
|
257 |
4.4.1. Main configuration file
|
254 |
|
258 |
|
255 |
recoll.conf is the main configuration file. It defines things like what to
|
259 |
recoll.conf is the main configuration file. It defines things like what to
|
256 |
index (top directories and things to ignore), and the default character
|
260 |
index (top directories and things to ignore), and the default character
|
|
... |
|
... |
273 |
dbdir
|
277 |
dbdir
|
274 |
|
278 |
|
275 |
The name of the Xapian data directory. It will be created if
|
279 |
The name of the Xapian data directory. It will be created if
|
276 |
needed when the index is initialized. If this is not an absolute
|
280 |
needed when the index is initialized. If this is not an absolute
|
277 |
path, it will be interpreted relative to the configuration
|
281 |
path, it will be interpreted relative to the configuration
|
278 |
directory.
|
282 |
directory. The value can have embedded spaces but starting or
|
|
|
283 |
trailing spaces will be trimmed. You cannot use quotes here.
|
279 |
|
284 |
|
280 |
skippedNames
|
285 |
skippedNames
|
281 |
|
286 |
|
282 |
A space-separated list of patterns for names of files or
|
287 |
A space-separated list of patterns for names of files or
|
283 |
directories that should be completely ignored. The list defined in
|
288 |
directories that should be completely ignored. The list defined in
|
284 |
the default file is:
|
289 |
the default file is:
|
285 |
|
290 |
|
286 |
*~ #* bin CVS Cache caughtspam tmp
|
291 |
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
|
|
292 |
*~ recollrc
|
287 |
|
293 |
|
288 |
The list can be redefined for sub-directories, but is only
|
294 |
The list can be redefined for sub-directories, but is only
|
289 |
actually changed for the top level ones in topdirs.
|
295 |
actually changed for the top level ones in topdirs.
|
290 |
|
296 |
|
291 |
The top-level directories are not affected by this list (that is,
|
297 |
The top-level directories are not affected by this list (that is,
|
|
... |
|
... |
296 |
index quite a few things that you do not want. On the other hand,
|
302 |
index quite a few things that you do not want. On the other hand,
|
297 |
mail user agents like thunderbird usually store messages in hidden
|
303 |
mail user agents like thunderbird usually store messages in hidden
|
298 |
directories, and you probably want this indexed. One possible
|
304 |
directories, and you probably want this indexed. One possible
|
299 |
solution is to have .* in skippedNames, and add things like
|
305 |
solution is to have .* in skippedNames, and add things like
|
300 |
~/.thunderbird or ~/.evolution in topdirs.
|
306 |
~/.thunderbird or ~/.evolution in topdirs.
|
|
|
307 |
|
|
|
308 |
skippedPaths and daemSkippedPaths
|
|
|
309 |
|
|
|
310 |
A space-separated list of patterns for paths of files or
|
|
|
311 |
directories that should be skipped. There is no default in the
|
|
|
312 |
sample configuration file, but the code always adds the
|
|
|
313 |
configuration and database directories in there.
|
|
|
314 |
|
|
|
315 |
skippedPaths is used both by batch and real time indexing.
|
|
|
316 |
daemSkippedPaths can be used to specify things that should be
|
|
|
317 |
indexed at startup, but not monitored.
|
|
|
318 |
|
|
|
319 |
Example of use for skipping text files only in a specific
|
|
|
320 |
directory:
|
|
|
321 |
|
|
|
322 |
skippedPaths = ~/somedir/*.txt
|
|
|
323 |
|
301 |
|
324 |
|
302 |
loglevel,daemloglevel
|
325 |
loglevel,daemloglevel
|
303 |
|
326 |
|
304 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
327 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
305 |
quite a lot of debug/information messages. 2 only lists errors.
|
328 |
quite a lot of debug/information messages. 2 only lists errors.
|
|
... |
|
... |
422 |
non-default entries, which will override those from the central
|
445 |
non-default entries, which will override those from the central
|
423 |
configuration file.
|
446 |
configuration file.
|
424 |
|
447 |
|
425 |
Please note that these entries must be placed under a [view] section.
|
448 |
Please note that these entries must be placed under a [view] section.
|
426 |
|
449 |
|
|
|
450 |
If Use desktop preferences to choose document editor is checked in the
|
|
|
451 |
user preferences, all mimeview entries will be ignored except the one
|
|
|
452 |
labelled application/x-all (which is set to use xdg-open by default).
|
|
|
453 |
|
|
|
454 |
4.4.5. Examples of configuration adjustments
|
|
|
455 |
|
|
|
456 |
4.4.5.1. Adding an external viewer for an non-indexed type
|
|
|
457 |
|
|
|
458 |
Imagine that you have some kind of file which does not have indexable
|
|
|
459 |
content, but for which you would like to have a functional Edit link in
|
|
|
460 |
the result list (when found by file name). The file names end in .blob and
|
|
|
461 |
can be displayed by application blobviewer.
|
|
|
462 |
|
|
|
463 |
You need two entries in the configuration files for this to work:
|
|
|
464 |
|
|
|
465 |
* In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
|
|
466 |
following line:
|
|
|
467 |
|
|
|
468 |
application/x-blobapp = .blob
|
|
|
469 |
|
|
|
470 |
|
|
|
471 |
Note that the mime type is made up here, and you could call it
|
|
|
472 |
diesel/oil just the same.
|
|
|
473 |
|
|
|
474 |
* In $RECOLL_CONFDIR/mimeview under the [view] section:
|
|
|
475 |
|
|
|
476 |
application/x-blobapp = blobviewer %f
|
|
|
477 |
|
|
|
478 |
|
|
|
479 |
We are supposing that blobviewer wants a file name parameter here, you
|
|
|
480 |
would use %u if it liked URLs better.
|
|
|
481 |
|
|
|
482 |
If you just wanted to change the application used by Recoll to display a
|
|
|
483 |
mime type which it already knows, you would just need to edit mimeview.
|
|
|
484 |
The entries you add in your personal file override those in the central
|
|
|
485 |
configuration, which you do not need to alter
|
|
|
486 |
|
|
|
487 |
4.4.5.2. Adding indexing support for a new file type
|
|
|
488 |
|
|
|
489 |
Let us now imagine that the above .blob files actually contain indexable
|
|
|
490 |
text and that you know how to extract it with a command line program.
|
|
|
491 |
Getting Recoll to index the files is easy. You need to perform the above
|
|
|
492 |
alteration, and also to add data to the mimeconf file (typically in
|
|
|
493 |
~/.recoll/mimeconf):
|
|
|
494 |
|
|
|
495 |
* Under the [index] section, add the following line (more about the
|
|
|
496 |
rclblob indexing script later):
|
|
|
497 |
|
|
|
498 |
application/x-blobapp = exec rclblob
|
|
|
499 |
|
|
|
500 |
|
|
|
501 |
* Under the [icons] section, you should choose an icon to be displayed
|
|
|
502 |
for the files inside the result lists. Icons are normally 64x64 pixels
|
|
|
503 |
PNG files which live in /usr/[local/]share/recoll/images.
|
|
|
504 |
|
|
|
505 |
* Under the [categories] section, you should add the mime type where it
|
|
|
506 |
makes sense (you can also create a category). Categories may be used
|
|
|
507 |
for filtering in advanced search.
|
|
|
508 |
|
|
|
509 |
The rclblob filter should be an executable program or script which exists
|
|
|
510 |
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
|
|
511 |
argument and should output the text contents in html format on the
|
|
|
512 |
standard output.
|
|
|
513 |
|
|
|
514 |
The html could be very minimal like the following example:
|
|
|
515 |
|
|
|
516 |
<html><head>
|
|
|
517 |
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
|
|
518 |
</head>
|
|
|
519 |
<body>some text content</body></html>
|
|
|
520 |
|
|
|
521 |
|
|
|
522 |
You should take care to escape some characters inside the text by
|
|
|
523 |
transforming them into appropriate entities. "&" should be transformed
|
|
|
524 |
into "&", "<" should be transformed into "<".
|
|
|
525 |
|
|
|
526 |
The character set needs to be specified in the header. It does not need to
|
|
|
527 |
be UTF-8 (Recoll will take care of translating it), but it must be
|
|
|
528 |
accurate for good results.
|
|
|
529 |
|
|
|
530 |
Recoll will also make use of other header fields if they are present:
|
|
|
531 |
title, description, keywords.
|
|
|
532 |
|
|
|
533 |
The easiest way to write a new filter is probably to start from an
|
|
|
534 |
existing one.
|
|
|
535 |
|
427 |
--------------------------------------------------------------------------
|
536 |
--------------------------------------------------------------------------
|
428 |
|
537 |
|
429 |
Prev Home
|
538 |
Prev Home
|
430 |
Building from source Up
|
539 |
Building from source Up
|