|
a/src/README |
|
b/src/README |
|
... |
|
... |
10 |
|
10 |
|
11 |
Copyright (c) 2005 Jean-Francois Dockes
|
11 |
Copyright (c) 2005 Jean-Francois Dockes
|
12 |
|
12 |
|
13 |
This document introduces full text search notions and describes the
|
13 |
This document introduces full text search notions and describes the
|
14 |
installation and use of the Recoll application. It currently describes
|
14 |
installation and use of the Recoll application. It currently describes
|
15 |
Recoll 1.12.
|
15 |
Recoll 1.12-1.13.
|
|
|
16 |
|
|
|
17 |
[ Split HTML / Single HTML ]
|
16 |
|
18 |
|
17 |
----------------------------------------------------------------------
|
19 |
----------------------------------------------------------------------
|
18 |
|
20 |
|
19 |
Table of Contents
|
21 |
Table of Contents
|
20 |
|
22 |
|
|
... |
|
... |
38 |
|
40 |
|
39 |
2.3. Indexing configuration
|
41 |
2.3. Indexing configuration
|
40 |
|
42 |
|
41 |
2.3.1. The indexing configuration GUI
|
43 |
2.3.1. The indexing configuration GUI
|
42 |
|
44 |
|
|
|
45 |
2.4. Using Beagle WEB browser plugins
|
|
|
46 |
|
43 |
2.4. Periodic indexing
|
47 |
2.5. Periodic indexing
|
44 |
|
48 |
|
45 |
2.4.1. Starting indexing
|
49 |
2.5.1. Starting indexing
|
46 |
|
50 |
|
47 |
2.4.2. Using cron to automate indexing
|
51 |
2.5.2. Using cron to automate indexing
|
48 |
|
52 |
|
49 |
2.5. Real time indexing
|
53 |
2.6. Real time indexing
|
50 |
|
54 |
|
51 |
3. Searching with the Qt graphical user interface
|
55 |
3. Searching with the Qt graphical user interface
|
52 |
|
56 |
|
53 |
3.1. Simple search
|
57 |
3.1. Simple search
|
54 |
|
58 |
|
|
... |
|
... |
80 |
|
84 |
|
81 |
3.11.3. Others
|
85 |
3.11.3. Others
|
82 |
|
86 |
|
83 |
3.12. Customizing the search interface
|
87 |
3.12. Customizing the search interface
|
84 |
|
88 |
|
|
|
89 |
3.12.1. The result list paragraph format
|
|
|
90 |
|
85 |
4. Searching with the KDE KIO slave
|
91 |
4. Searching with the KDE KIO slave
|
86 |
|
92 |
|
87 |
4.1. What's this
|
93 |
4.1. What's this
|
88 |
|
94 |
|
89 |
4.2. Searchable documents
|
95 |
4.2. Searchable documents
|
|
... |
|
... |
104 |
|
110 |
|
105 |
6.3.2. Python interface
|
111 |
6.3.2. Python interface
|
106 |
|
112 |
|
107 |
7. Installation
|
113 |
7. Installation
|
108 |
|
114 |
|
109 |
7.1. Installing a prebuilt copy
|
115 |
7.1. Installing a binary copy
|
110 |
|
116 |
|
111 |
7.1.1. Installing through a package system
|
117 |
7.1.1. Installing through a package system
|
112 |
|
118 |
|
113 |
7.1.2. Installing a prebuilt Recoll
|
119 |
7.1.2. Installing a prebuilt Recoll
|
114 |
|
120 |
|
|
... |
|
... |
271 |
|
277 |
|
272 |
|
278 |
|
273 |
Recoll knows about quite a few different document types. The parameters
|
279 |
Recoll knows about quite a few different document types. The parameters
|
274 |
for document types recognition and processing are set in configuration
|
280 |
for document types recognition and processing are set in configuration
|
275 |
files Most file types, like HTML or word processing files, only hold one
|
281 |
files Most file types, like HTML or word processing files, only hold one
|
276 |
document. Some file types, like mail folder files can hold many
|
282 |
document. Some file types, like mail folder files, can hold many
|
277 |
individually indexed documents.
|
283 |
individually indexed documents.
|
278 |
|
284 |
|
279 |
Recoll indexing processes plain text, HTML, openoffice and e-mail files
|
285 |
Recoll indexing processes plain text, HTML, openoffice and e-mail files
|
280 |
internally.
|
286 |
internally (a few more actually).
|
281 |
|
287 |
|
282 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
288 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
283 |
applications for preprocessing. The list is in the installation section.
|
289 |
applications for preprocessing. The list is in the installation section.
|
284 |
After every indexing operation, Recoll updates a list of commands that
|
290 |
After every indexing operation, Recoll updates a list of commands that
|
285 |
would be needed for indexing existing files types. This list can be
|
291 |
would be needed for indexing existing files types. This list can be
|
|
... |
|
... |
293 |
system to separate databases. You can do this by using multiple
|
299 |
system to separate databases. You can do this by using multiple
|
294 |
configuration directories, each indexing a file system area to a specific
|
300 |
configuration directories, each indexing a file system area to a specific
|
295 |
database. See the section about using multiple databases for more
|
301 |
database. See the section about using multiple databases for more
|
296 |
information on multiple configurations and indexes.
|
302 |
information on multiple configurations and indexes.
|
297 |
|
303 |
|
|
|
304 |
In the rare case where the index becomes corrupted (which can signal
|
|
|
305 |
itself by weird search results or crashes), the index files need to be
|
|
|
306 |
erased before restarting a clean indexing pass. Just delete the xapiandb
|
|
|
307 |
directory (see next section), or, alternatively, start the next
|
|
|
308 |
recollindex with the -z option, which will reset the database before
|
|
|
309 |
indexing.
|
|
|
310 |
|
298 |
----------------------------------------------------------------------
|
311 |
----------------------------------------------------------------------
|
299 |
|
312 |
|
300 |
2.2. Index storage
|
313 |
2.2. Index storage
|
301 |
|
314 |
|
302 |
The default location for the index data is the xapiandb subdirectory of
|
315 |
The default location for the index data is the xapiandb subdirectory of
|
|
... |
|
... |
327 |
configuration section). This method would mainly be of use if you
|
340 |
configuration section). This method would mainly be of use if you
|
328 |
wanted to keep the configuration directory in its default location,
|
341 |
wanted to keep the configuration directory in its default location,
|
329 |
but desired another location for the index, typically out of disk
|
342 |
but desired another location for the index, typically out of disk
|
330 |
occupation concerns.
|
343 |
occupation concerns.
|
331 |
|
344 |
|
332 |
The size of the index is determined by the size of the set of documents,
|
345 |
The size of the index is determined by the document set size, but the
|
333 |
but the ratio can vary a lot. For a typical mixed set of documents, the
|
346 |
ratio can vary a lot. For a typical mixed set of documents, the index size
|
334 |
index size will often be close to the data set size. In specific cases (a
|
347 |
will often be close to the data set size. In specific cases (a set of
|
335 |
set of compressed mbox files for example), the index can become much
|
348 |
compressed mbox files for example), the index can become much bigger than
|
336 |
bigger than the documents. It may also be much smaller if the documents
|
349 |
the documents. It may also be much smaller if the documents contain a lot
|
337 |
contain a lot of images or other non-indexed data (an extreme example
|
350 |
of images or other non-indexed data (an extreme example being a set of mp3
|
338 |
being a set of mp3 files where only the tags would be indexed).
|
351 |
files where only the tags would be indexed).
|
339 |
|
352 |
|
340 |
Of course, images, sound and video do not increase the index size, which
|
353 |
Of course, images, sound and video do not increase the index size, which
|
341 |
means that it will be quite typical nowadays (2006), that even a big index
|
354 |
means that it will be quite typical nowadays (2006), that even a big index
|
342 |
will be negligible against the total amount of data on the computer.
|
355 |
will be negligible against the total amount of data on the computer.
|
343 |
|
356 |
|
|
... |
|
... |
403 |
You can also use multiple indexes defined by separate configurations,
|
416 |
You can also use multiple indexes defined by separate configurations,
|
404 |
typically to separate personal and shared indexes, or to take advantage of
|
417 |
typically to separate personal and shared indexes, or to take advantage of
|
405 |
the organization of your data to improve search precision.
|
418 |
the organization of your data to improve search precision.
|
406 |
|
419 |
|
407 |
The first time you start recoll, you will be asked whether or not you
|
420 |
The first time you start recoll, you will be asked whether or not you
|
408 |
would like recoll to build the index. If you want to adjust the
|
421 |
would like it to build the index. If you want to adjust the configuration
|
409 |
configuration before indexing, just click Cancel at this point. That way,
|
422 |
before indexing, just click Cancel at this point, which will get you into
|
410 |
recoll will have created a ~/.recoll directory containing empty
|
423 |
the configuration interface. If you exit, recoll will have created a
|
411 |
configuration files.
|
424 |
~/.recoll directory containing empty configuration files, which you can
|
|
|
425 |
edit by hand.
|
412 |
|
426 |
|
413 |
The configuration is documented inside the installation chapter of this
|
427 |
The configuration is documented inside the installation chapter of this
|
414 |
document, or in the recoll.conf(5) man page, but the most current
|
428 |
document, or in the recoll.conf(5) man page, but the most current
|
415 |
information will most likely be the comments inside the sample file. The
|
429 |
information will most likely be the comments inside the sample file. The
|
416 |
most immediately useful variable you may interested in is probably
|
430 |
most immediately useful variable you may interested in is probably
|
|
... |
|
... |
445 |
use it on hand-edited files, which you might nevertheless want to backup
|
459 |
use it on hand-edited files, which you might nevertheless want to backup
|
446 |
first...
|
460 |
first...
|
447 |
|
461 |
|
448 |
----------------------------------------------------------------------
|
462 |
----------------------------------------------------------------------
|
449 |
|
463 |
|
|
|
464 |
2.4. Using Beagle WEB browser plugins
|
|
|
465 |
|
|
|
466 |
Beagle is a concurrent desktop indexer, built on Lucene and the Mono
|
|
|
467 |
project (C#), for which a number of add-on browser plugins were written.
|
|
|
468 |
These work by copying visited web pages to an indexing queue directory,
|
|
|
469 |
which the indexer then processes.
|
|
|
470 |
|
|
|
471 |
If, for any reason, you so happen to prefer Recoll to Beagle, you can
|
|
|
472 |
still use the browser plugins (they are written in Javascript and
|
|
|
473 |
completely independant of C#, Beagle, Lucene...). Recoll can process the
|
|
|
474 |
Beagle queue directory. Of course, this supposes that Beagle is not
|
|
|
475 |
running, else both programs will fight for the same files.
|
|
|
476 |
|
|
|
477 |
This feature can be enabled in the GUI indexing configuration panel, or by
|
|
|
478 |
editing the configuration file (set processbeaglequeue to 1).
|
|
|
479 |
|
|
|
480 |
----------------------------------------------------------------------
|
|
|
481 |
|
450 |
2.4. Periodic indexing
|
482 |
2.5. Periodic indexing
|
451 |
|
483 |
|
452 |
2.4.1. Starting indexing
|
484 |
2.5.1. Starting indexing
|
453 |
|
485 |
|
454 |
Indexing is performed either by the recollindex program, or by the
|
486 |
Indexing is performed either by the recollindex program, or by the
|
455 |
indexing thread inside the recoll program (use the File menu). Both
|
487 |
indexing thread inside the recoll program (use the File menu). Both
|
456 |
programs will use the RECOLL_CONFDIR variable or accept a -c confdir
|
488 |
programs will use the RECOLL_CONFDIR variable or accept a -c confdir
|
457 |
option to specify a non-default configuration directory.
|
489 |
option to specify a non-default configuration directory.
|
458 |
|
490 |
|
459 |
If the recoll program finds no index when it starts, it will automatically
|
491 |
If the recoll program finds no index when it starts, it will automatically
|
460 |
start indexing (except if canceled).
|
492 |
start indexing (except if canceled).
|
461 |
|
493 |
|
462 |
It is best to avoid interrupting the indexing process, as this may
|
494 |
The indexing process can be interrupted by sending an interrupt (^C,
|
463 |
sometimes leave the index in a bad state. This is not a serious problem,
|
495 |
SIGINT) or terminate (SIGTERM) signal. Some time may elapse before the
|
464 |
as you then just need to delete the index files and restart the indexing.
|
496 |
process exits, because it needs to properly flush and close the index. The
|
465 |
The index files are normally stored in the $HOME/.recoll/xapiandb
|
497 |
indexing will restart at the interruption point the next time (the full
|
466 |
directory, which you can just delete if needed. Alternatively, you can
|
498 |
file tree will still be traversed, but files that were indexed up to the
|
467 |
start recollindex with option -z, which will reset the database before
|
499 |
interruption and are still up to date will not need to be reindexed).
|
468 |
indexing.
|
|
|
469 |
|
500 |
|
470 |
----------------------------------------------------------------------
|
501 |
After such an interruption, the index will be somewhat inconsistent
|
|
|
502 |
because some operations which are normally performed at the end of the
|
|
|
503 |
indexing pass will have been skipped (for exemple, the stemming and
|
|
|
504 |
spelling databases will be inexistant or out of date). You just need to
|
|
|
505 |
restart indexing at a later time to restore consistency.
|
471 |
|
506 |
|
|
|
507 |
----------------------------------------------------------------------
|
|
|
508 |
|
472 |
2.4.2. Using cron to automate indexing
|
509 |
2.5.2. Using cron to automate indexing
|
473 |
|
510 |
|
474 |
The most common way to set up indexing is to have a cron task execute it
|
511 |
The most common way to set up indexing is to have a cron task execute it
|
475 |
every night. For example the following crontab entry would do it every day
|
512 |
every night. For example the following crontab entry would do it every day
|
476 |
at 3:30AM (supposing recollindex is in your PATH):
|
513 |
at 3:30AM (supposing recollindex is in your PATH):
|
477 |
|
514 |
|
478 |
30 3 * * * recollindex > /tmp/recolltrace 2>&1
|
515 |
30 3 * * * recollindex > /some/tmp/dir/recolltrace 2>&1
|
|
|
516 |
|
|
|
517 |
Or, using anacron:
|
|
|
518 |
|
|
|
519 |
1 15 su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"
|
479 |
|
520 |
|
480 |
The usual command to edit your crontab is crontab -e (which will usually
|
521 |
The usual command to edit your crontab is crontab -e (which will usually
|
481 |
start the vi editor to edit the file). You may have more sophisticated
|
522 |
start the vi editor to edit the file). You may have more sophisticated
|
482 |
tools available on your system.
|
523 |
tools available on your system.
|
483 |
|
524 |
|
484 |
----------------------------------------------------------------------
|
525 |
----------------------------------------------------------------------
|
485 |
|
526 |
|
486 |
2.5. Real time indexing
|
527 |
2.6. Real time indexing
|
487 |
|
528 |
|
488 |
Real time monitoring/indexing is performed by starting the recollindex -m
|
529 |
Real time monitoring/indexing is performed by starting the recollindex -m
|
489 |
command. With this option, recollindex will detach from the terminal and
|
530 |
command. With this option, recollindex will detach from the terminal and
|
490 |
become a daemon, permanently monitoring file changes and updating the
|
531 |
become a daemon, permanently monitoring file changes and updating the
|
491 |
index.
|
532 |
index.
|
|
... |
|
... |
511 |
|
552 |
|
512 |
The indexing daemon gets started, then the window manager, for which the
|
553 |
The indexing daemon gets started, then the window manager, for which the
|
513 |
session waits.
|
554 |
session waits.
|
514 |
|
555 |
|
515 |
By default the indexing daemon will monitor the state of the X11 session,
|
556 |
By default the indexing daemon will monitor the state of the X11 session,
|
516 |
and exit when it finishes, it is not necessary to kill it explicitly.
|
557 |
and exit when it finishes, it is not necessary to kill it explicitly. (The
|
517 |
(The X11 server monitoring can be disabled with option -x to recollindex).
|
558 |
X11 server monitoring can be disabled with option -x to recollindex).
|
518 |
|
559 |
|
519 |
Under KDE, you can place a small script to start recollindex -m under
|
560 |
Under KDE, you can place a small script to start recollindex -m under
|
520 |
$HOME/.kde/Autostart. This will be executed when the session begins.
|
561 |
$HOME/.kde/Autostart. This will be executed when the session begins.
|
521 |
|
562 |
|
522 |
There is a similar mechanism under Gnome (find the session control tool in
|
563 |
There is a similar mechanism under Gnome (find the session control tool in
|
523 |
the menus and use the "Startup programs" tab).
|
564 |
the menus and use the "Startup programs" tab).
|
524 |
|
565 |
|
525 |
By default, the indexing daemon will write its messages to a file inside
|
566 |
By default, the messages from the indexing daemon will be discarded. You
|
526 |
the configuration directory (this is controlled by the daemlogfilename and
|
567 |
may want to change this by setting the daemlogfilename and daemloglevel
|
527 |
daemloglevel configuration parameters). You may want to change this. Also
|
568 |
configuration parameters. Also the log file will only be truncated when
|
528 |
the log file will only be truncated when the daemon starts. If the daemon
|
569 |
the daemon starts. If the daemon runs permanently, the log file may grow
|
529 |
runs permanently, the log file may grow quite big, depending on the log
|
570 |
quite big, depending on the log level.
|
530 |
level.
|
|
|
531 |
|
571 |
|
532 |
While it is convenient that data is indexed in real time, repeated
|
572 |
While it is convenient that data is indexed in real time, repeated
|
533 |
indexing can generate a significant load on the system when files such as
|
573 |
indexing can generate a significant load on the system when files such as
|
534 |
email folders change. Also, monitoring large file trees by itself
|
574 |
email folders change. Also, monitoring large file trees by itself
|
535 |
significantly taxes system resources. You probably do not want to enable
|
575 |
significantly taxes system resources. You probably do not want to enable
|
|
... |
|
... |
582 |
better scores). Any term will search for documents where at least one of
|
622 |
better scores). Any term will search for documents where at least one of
|
583 |
the terms appear.
|
623 |
the terms appear.
|
584 |
|
624 |
|
585 |
File name will specifically look for file names. The entry will be split
|
625 |
File name will specifically look for file names. The entry will be split
|
586 |
at white space characters, and each pattern will be separately expanded.
|
626 |
at white space characters, and each pattern will be separately expanded.
|
587 |
If you want to search for a pattern including white space, you need to use
|
627 |
If you want to search for a pattern including white space, use double
|
588 |
double quotes. The point of having a separate file name search is that
|
628 |
quotes. The point of having a separate file name search is that wild card
|
589 |
wild card expansion can be performed more efficiently on a relatively
|
629 |
expansion can be performed more efficiently on a relatively small subset
|
590 |
small subset of the index.
|
630 |
of the index.
|
591 |
|
631 |
|
592 |
The fourth entry (Query Language) is described in its own section.
|
632 |
The fourth entry (Query Language) is described in its own section.
|
593 |
|
633 |
|
594 |
All search modes allow wildcards inside terms (*, ?, []). You may want to
|
634 |
All search modes allow wildcards inside terms (*, ?, []). You may want to
|
595 |
have a look at the section about wildcards for more information about
|
635 |
have a look at the section about wildcards for more information about
|
|
... |
|
... |
599 |
enclosing the input inside double quotes. Ex: "virtual reality".
|
639 |
enclosing the input inside double quotes. Ex: "virtual reality".
|
600 |
|
640 |
|
601 |
Character case has no influence on search, except that you can disable
|
641 |
Character case has no influence on search, except that you can disable
|
602 |
stem expansion for any term by capitalizing it. Ie: a search for floor
|
642 |
stem expansion for any term by capitalizing it. Ie: a search for floor
|
603 |
will also normally look for flooring, floored, etc., but a search for
|
643 |
will also normally look for flooring, floored, etc., but a search for
|
604 |
Floor will only look for floor, in any character case. Sstemming can also
|
644 |
Floor will only look for floor, in any character case. Stemming can also
|
605 |
be disabled globally in the preferences.
|
645 |
be disabled globally in the preferences.
|
606 |
|
646 |
|
607 |
Recoll remembers the last few searches that you performed. You can use the
|
647 |
Recoll remembers the last few searches that you performed. You can use the
|
608 |
simple search text entry widget (a combobox) to recall them (click on the
|
648 |
simple search text entry widget (a combobox) to recall them (click on the
|
609 |
thing at the right of the text field). Please note, however, that only the
|
649 |
thing at the right of the text field). Please note, however, that only the
|
|
... |
|
... |
614 |
extracted from the database.
|
654 |
extracted from the database.
|
615 |
|
655 |
|
616 |
Double-clicking on a word in the result list or a preview window will
|
656 |
Double-clicking on a word in the result list or a preview window will
|
617 |
insert it into the simple search entry field.
|
657 |
insert it into the simple search entry field.
|
618 |
|
658 |
|
619 |
Note that, apart from wildcard characters (single ? characters are ok),
|
|
|
620 |
you can cut and paste any text into an All terms or Any term search field,
|
659 |
You can cut and paste any text into an All terms or Any term search field,
|
621 |
punctuation, newlines and all. Recoll will process it and produce a
|
660 |
punctuation, newlines and all - except for wildcard characters (single ?
|
|
|
661 |
characters are ok). Recoll will process it and produce a meaningful
|
622 |
meaningful search. This is what most differentiates this mode from the
|
662 |
search. This is what most differentiates this mode from the Query Language
|
623 |
Query Language mode, where you have to care about the syntax.
|
663 |
mode, where you have to care about the syntax.
|
624 |
|
664 |
|
625 |
You can use the Tools / Advanced search dialog for more complex searches.
|
665 |
You can use the Tools / Advanced search dialog for more complex searches.
|
626 |
|
666 |
|
627 |
----------------------------------------------------------------------
|
667 |
----------------------------------------------------------------------
|
628 |
|
668 |
|
|
... |
|
... |
640 |
open tabs in the existing preview window. You can use Shift+Click to force
|
680 |
open tabs in the existing preview window. You can use Shift+Click to force
|
641 |
the creation of another preview window, which may be useful to view the
|
681 |
the creation of another preview window, which may be useful to view the
|
642 |
documents side by side. (You can also browse successive results in a
|
682 |
documents side by side. (You can also browse successive results in a
|
643 |
single preview window by typing Shift+ArrowUp/Down in the window).
|
683 |
single preview window by typing Shift+ArrowUp/Down in the window).
|
644 |
|
684 |
|
645 |
Clicking the Edit link will attempt to start an external editor. The
|
685 |
Clicking the Open link will attempt to start an external viewer. The
|
646 |
editors can be configured through the user preferences dialog, or by
|
686 |
viewer for each document type can be configured through the user
|
647 |
editing the mimeview configuration file.
|
687 |
preferences dialog, or by editing the mimeview configuration file. You can
|
|
|
688 |
also check the Use desktop preferences option in the user preferences
|
|
|
689 |
dialog to use the desktop defaults for all documents. This is probably the
|
|
|
690 |
best option if you are using a well configured Gnome or KDE desktop.
|
648 |
|
691 |
|
649 |
The Preview and Edit edit links may not be present for all entries,
|
692 |
The Preview and Open edit links may not be present for all entries,
|
650 |
meaning that Recoll has no configured way to preview a given file type
|
693 |
meaning that Recoll has no configured way to preview a given file type
|
651 |
(which was indexed by name only), or no configured external editor for the
|
694 |
(which was indexed by name only), or no configured external editor for the
|
652 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
695 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
653 |
and mimeview configuration files (the latter can be modified with the user
|
696 |
and mimeview configuration files (the latter can be modified with the user
|
654 |
preferences dialog).
|
697 |
preferences dialog).
|
|
... |
|
... |
685 |
|
728 |
|
686 |
* Save to File
|
729 |
* Save to File
|
687 |
|
730 |
|
688 |
* Find similar
|
731 |
* Find similar
|
689 |
|
732 |
|
|
|
733 |
* Preview Parent document
|
|
|
734 |
|
690 |
* Parent document
|
735 |
* Open Parent document
|
691 |
|
736 |
|
692 |
The Preview and Edit entries do the same thing as the corresponding links.
|
737 |
The Preview and Edit entries do the same thing as the corresponding links.
|
693 |
|
738 |
|
694 |
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
739 |
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
695 |
for later pasting.
|
740 |
for later pasting.
|
|
... |
|
... |
703 |
The Find similar entry will select a number of relevant term from the
|
748 |
The Find similar entry will select a number of relevant term from the
|
704 |
current document and enter them into the simple search field. You can then
|
749 |
current document and enter them into the simple search field. You can then
|
705 |
start a simple search, with a good chance of finding documents related to
|
750 |
start a simple search, with a good chance of finding documents related to
|
706 |
the current result.
|
751 |
the current result.
|
707 |
|
752 |
|
708 |
The Parent document entry will appear for documents which are not actually
|
753 |
The Parent document entries will appear for documents which are not
|
709 |
files but are part of, or attached to, a higher level document. This entry
|
754 |
actually files but are part of, or attached to, a higher level document.
|
710 |
is mainly useful for email attachments and permits viewing the message to
|
755 |
This entry is mainly useful for email attachments and permits viewing the
|
711 |
which the document is attached. Note that the entry will also appear for
|
756 |
message to which the document is attached. Note that the entry will also
|
712 |
an email which is part of an mbox folder file, but that you can't actually
|
757 |
appear for an email which is part of an mbox folder file, but that you
|
713 |
visualize the folder (there will be an error dialog if you try). Recoll is
|
758 |
can't actually visualize the folder (there will be an error dialog if you
|
714 |
unfortunately not yet smart enough to disable the entry in this case.
|
759 |
try). Recoll is unfortunately not yet smart enough to disable the entry in
|
|
|
760 |
this case. In other cases, the Open option makes sense, for exemple to
|
|
|
761 |
start a chm viewer on the parent document for a help page.
|
715 |
|
762 |
|
716 |
----------------------------------------------------------------------
|
763 |
----------------------------------------------------------------------
|
717 |
|
764 |
|
718 |
3.3. The preview window
|
765 |
3.3. The preview window
|
719 |
|
766 |
|
|
... |
|
... |
752 |
A right-click menu in the text area allows switching between displaying
|
799 |
A right-click menu in the text area allows switching between displaying
|
753 |
the main text or the contents of fields associated to the document (ie:
|
800 |
the main text or the contents of fields associated to the document (ie:
|
754 |
author, abtract, etc.). This is especially useful in cases where the term
|
801 |
author, abtract, etc.). This is especially useful in cases where the term
|
755 |
match did not occur in the main text but in one of the fields.
|
802 |
match did not occur in the main text but in one of the fields.
|
756 |
|
803 |
|
|
|
804 |
You can print the current preview window contents by typing ^P (Ctrl + P)
|
|
|
805 |
in the window text.
|
|
|
806 |
|
757 |
----------------------------------------------------------------------
|
807 |
----------------------------------------------------------------------
|
758 |
|
808 |
|
759 |
3.4. The query language
|
809 |
3.4. The query language
|
760 |
|
810 |
|
761 |
The query language processor is activated on the simple search entry when
|
811 |
The query language processor is activated on the simple search entry when
|
|
... |
|
... |
846 |
|
896 |
|
847 |
You can use the show query link at the top of the result list to check the
|
897 |
You can use the show query link at the top of the result list to check the
|
848 |
exact query which was finally executed by Xapian.
|
898 |
exact query which was finally executed by Xapian.
|
849 |
|
899 |
|
850 |
Most Xesam phrase modifiers are unsupported, except for l (small ell) to
|
900 |
Most Xesam phrase modifiers are unsupported, except for l (small ell) to
|
851 |
disable stemming, and p to turn an phrase into a NEAR (unordered) search.
|
901 |
disable stemming, and p to turn a phrase into a NEAR (unordered) search.
|
852 |
Exemple: "prejudice pride"p
|
902 |
Exemple: "prejudice pride"p
|
853 |
|
903 |
|
854 |
----------------------------------------------------------------------
|
904 |
----------------------------------------------------------------------
|
855 |
|
905 |
|
856 |
3.5. Complex/advanced search
|
906 |
3.5. Complex/advanced search
|
|
... |
|
... |
1160 |
Browsing the result list inside a preview window. Entering Shift-Down or
|
1210 |
Browsing the result list inside a preview window. Entering Shift-Down or
|
1161 |
Shift-Up (Shift + an arrow key) in a preview window will display the next
|
1211 |
Shift-Up (Shift + an arrow key) in a preview window will display the next
|
1162 |
or the previous document from the result list. Any secondary search
|
1212 |
or the previous document from the result list. Any secondary search
|
1163 |
currently active will be executed on the new document.
|
1213 |
currently active will be executed on the new document.
|
1164 |
|
1214 |
|
|
|
1215 |
Scrolling the result list from the keyboard. You can use PageUp and
|
|
|
1216 |
PageDown to scroll the result list, Shift+Home to go back to the first
|
|
|
1217 |
page. These work even while the focus is in the search entry.
|
|
|
1218 |
|
1165 |
Forced opening of a preview window. You can use Shift+Click on a result
|
1219 |
Forced opening of a preview window. You can use Shift+Click on a result
|
1166 |
list Preview link to force the creation of a preview window instead of a
|
1220 |
list Preview link to force the creation of a preview window instead of a
|
1167 |
new tab in the existing one.
|
1221 |
new tab in the existing one.
|
1168 |
|
1222 |
|
1169 |
Closing previews. Entering ^W in a tab will close it (and, for the last
|
1223 |
Closing previews. Entering ^W in a tab will close it (and, for the last
|
1170 |
tab, close the preview window). Entering Esc will close the preview window
|
1224 |
tab, close the preview window). Entering Esc will close the preview window
|
1171 |
and all its tabs.
|
1225 |
and all its tabs.
|
1172 |
|
1226 |
|
|
|
1227 |
Printing previews. Entering ^P in a preview window will print the
|
|
|
1228 |
currently displayed text.
|
|
|
1229 |
|
1173 |
Quitting. Entering ^Q almost anywhere will close the application.
|
1230 |
Quitting. Entering ^Q almost anywhere will close the application.
|
1174 |
|
1231 |
|
1175 |
----------------------------------------------------------------------
|
1232 |
----------------------------------------------------------------------
|
1176 |
|
1233 |
|
1177 |
3.12. Customizing the search interface
|
1234 |
3.12. Customizing the search interface
|
1178 |
|
1235 |
|
1179 |
It is possible to customize some aspects of the search interface by using
|
1236 |
You can customize some aspects of the search interface by using the Query
|
1180 |
Query configuration entry in the Preferences menu.
|
1237 |
configuration entry in the Preferences menu.
|
1181 |
|
1238 |
|
1182 |
There are two tabs in the dialog, dealing with the interface itself, and
|
1239 |
There are several tabs in the dialog, dealing with the interface itself,
|
1183 |
with the parameters used for searching and returning results.
|
1240 |
the parameters used for searching and returning results, and what indexes
|
|
|
1241 |
are searched.
|
1184 |
|
1242 |
|
1185 |
User interface parameters:
|
1243 |
User interface parameters:
|
1186 |
|
1244 |
|
1187 |
* Number of results in a result page:
|
1245 |
* Number of results in a result page:
|
1188 |
|
1246 |
|
|
... |
|
... |
1198 |
result list, and you may want to customize the font and/or font size.
|
1256 |
result list, and you may want to customize the font and/or font size.
|
1199 |
The rest of the fonts used by Recoll are determined by your generic QT
|
1257 |
The rest of the fonts used by Recoll are determined by your generic QT
|
1200 |
config (try the qtconfig command).
|
1258 |
config (try the qtconfig command).
|
1201 |
|
1259 |
|
1202 |
* Result paragraph format string: allows you to change the presentation
|
1260 |
* Result paragraph format string: allows you to change the presentation
|
1203 |
of each result list entry. This is a qt-html string where the
|
1261 |
of each result list entry. This is described in its own section.
|
1204 |
following printf-like % substitutions will be performed:
|
|
|
1205 |
|
1262 |
|
|
|
1263 |
* Maximum text size highlighted for preview Inserting highlights on
|
|
|
1264 |
search term inside the text before inserting it in the preview window
|
|
|
1265 |
involves quite a lot of processing, and can be disabled over the given
|
|
|
1266 |
text size to speed up loading.
|
|
|
1267 |
|
|
|
1268 |
* Use desktop preferences to choose document editor: if this is checked,
|
|
|
1269 |
the xdg-open utility will be used to open files when you click the
|
|
|
1270 |
Edit link in the result list, instead of the application defined in
|
|
|
1271 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
|
|
1272 |
an appropriate application.
|
|
|
1273 |
|
|
|
1274 |
* Choose editor applications this will let you choose the command
|
|
|
1275 |
started by the Edit links inside the result list, for specific
|
|
|
1276 |
document types.
|
|
|
1277 |
|
|
|
1278 |
* Display category filter as toolbar... this will let you choose if the
|
|
|
1279 |
document categories are displayed as a list or a set of buttons.
|
|
|
1280 |
|
|
|
1281 |
* Auto-start simple search on white space entry: if this is checked, a
|
|
|
1282 |
search will be executed each time you enter a space in the simple
|
|
|
1283 |
search input field. This lets you look at the result list as you enter
|
|
|
1284 |
new terms. This is off by default, you may like it or not...
|
|
|
1285 |
|
|
|
1286 |
* Start with advanced search dialog open and Start with sort dialog
|
|
|
1287 |
open: If you use these dialogs all the time, checking these entries
|
|
|
1288 |
will get them to open when recoll starts.
|
|
|
1289 |
|
|
|
1290 |
* Remember sort activation state if set, Recoll will remember the sort
|
|
|
1291 |
tool stat between invocations. It normally starts with sorting
|
|
|
1292 |
disabled.
|
|
|
1293 |
|
|
|
1294 |
* Prefer HTML to plain text for preview if set, Recoll will display HTML
|
|
|
1295 |
as such inside the preview window. If this causes problems with the Qt
|
|
|
1296 |
HTML display, you can uncheck it to display the plain text version
|
|
|
1297 |
instead.
|
|
|
1298 |
|
|
|
1299 |
Search parameters:
|
|
|
1300 |
|
|
|
1301 |
* Stemming language: stemming obviously depends on the document's
|
|
|
1302 |
language. This listbox will let you chose among the stemming databases
|
|
|
1303 |
which were built during indexing (this is set in the main
|
|
|
1304 |
configuration file), or later added with recollindex -s (See the
|
|
|
1305 |
recollindex manual). Stemming languages which are dynamically added
|
|
|
1306 |
will be deleted at the next indexing pass unless they are also added
|
|
|
1307 |
in the configuration file.
|
|
|
1308 |
|
|
|
1309 |
* Dynamically add phrase to simple searches: a phrase will be
|
|
|
1310 |
automatically built and added to simple searches when looking for Any
|
|
|
1311 |
terms. This will give a relevance boost to the results where the
|
|
|
1312 |
search terms appear as a phrase (consecutive and in order).
|
|
|
1313 |
|
|
|
1314 |
* Replace abstracts from documents: this decides if we should synthesize
|
|
|
1315 |
and display an abstract in place of an explicit abstract found within
|
|
|
1316 |
the document itself.
|
|
|
1317 |
|
|
|
1318 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
|
|
1319 |
document abstracts when displaying the result list. Abstracts are
|
|
|
1320 |
constructed by taking context from the document information, around
|
|
|
1321 |
the search terms. This can slow down result list display significantly
|
|
|
1322 |
for big documents, and you may want to turn it off.
|
|
|
1323 |
|
|
|
1324 |
* Replace abstracts from documents: this decides if we should synthesize
|
|
|
1325 |
and display an abstract in place of an explicit abstract found within
|
|
|
1326 |
the document itself.
|
|
|
1327 |
|
|
|
1328 |
* Synthetic abstract size: adjust to taste...
|
|
|
1329 |
|
|
|
1330 |
* Synthetic abstract context words: how many words should be displayed
|
|
|
1331 |
around each term occurrence.
|
|
|
1332 |
|
|
|
1333 |
External indexes: This panel will let you browse for additional indexes
|
|
|
1334 |
that you may want to search. External indexes are designated by their
|
|
|
1335 |
database directory (ie: /home/someothergui/.recoll/xapiandb,
|
|
|
1336 |
/usr/local/recollglobal/xapiandb).
|
|
|
1337 |
|
|
|
1338 |
Once entered, the indexes will appear in the External indexes list, and
|
|
|
1339 |
you can chose which ones you want to use at any moment by checking or
|
|
|
1340 |
unchecking their entries.
|
|
|
1341 |
|
|
|
1342 |
Your main database (the one the current configuration indexes to), is
|
|
|
1343 |
always implicitly active. If this is not desirable, you can set up your
|
|
|
1344 |
configuration so that it indexes, for example, an empty directory. An
|
|
|
1345 |
alternative indexer may also need to implement a way of purging the index
|
|
|
1346 |
from stale data,
|
|
|
1347 |
|
|
|
1348 |
----------------------------------------------------------------------
|
|
|
1349 |
|
|
|
1350 |
3.12.1. The result list paragraph format
|
|
|
1351 |
|
|
|
1352 |
The presentation of each result inside the result list can be customized
|
|
|
1353 |
by setting the result list paragraph format inside the User Interface tab
|
|
|
1354 |
of the Query configuration.
|
|
|
1355 |
|
|
|
1356 |
This is a Qt HTML string where the following printf-like % substitutions
|
|
|
1357 |
will be performed:
|
|
|
1358 |
|
1206 |
* %A. Abstract
|
1359 |
* %A. Abstract
|
1207 |
|
1360 |
|
1208 |
* %D. Date
|
1361 |
* %D. Date
|
1209 |
|
1362 |
|
1210 |
* %I. Icon image name
|
1363 |
* %I. Icon image name
|
1211 |
|
1364 |
|
1212 |
* %K. Keywords (if any)
|
1365 |
* %K. Keywords (if any)
|
1213 |
|
1366 |
|
1214 |
* %L. Preview and Edit links
|
1367 |
* %L. Preview and Edit links
|
1215 |
|
1368 |
|
1216 |
* %M. Mime type
|
1369 |
* %M. Mime type
|
1217 |
|
1370 |
|
1218 |
* %N. result Number
|
1371 |
* %N. result Number
|
1219 |
|
1372 |
|
1220 |
* %R. Relevance percentage
|
1373 |
* %R. Relevance percentage
|
1221 |
|
1374 |
|
1222 |
* %S. Size information
|
1375 |
* %S. Size information
|
1223 |
|
1376 |
|
1224 |
* %T. Title
|
1377 |
* %T. Title
|
1225 |
|
1378 |
|
1226 |
* %U. Url
|
1379 |
* %U. Url
|
1227 |
|
1380 |
|
|
|
1381 |
The format of the Preview and Edit links is <a href="P%N"> and <a
|
|
|
1382 |
href="E%N"> where docnum (%N expands to the document number inside the
|
|
|
1383 |
result list).
|
|
|
1384 |
|
|
|
1385 |
In addition to the predefined values above, all strings like %(fieldname)
|
|
|
1386 |
will be replaced by the value of the field named fieldname for this
|
|
|
1387 |
document. Only stored fields can be accessed in this way, the value of
|
|
|
1388 |
indexed but not stored fields is not known at this point in the search
|
|
|
1389 |
process (see field configuration). There are currently very few fields
|
|
|
1390 |
stored by default, apart from the values above (only author), so this
|
|
|
1391 |
feature will need some custom local configuration to be useful. For
|
|
|
1392 |
example, you could look at the fields for the document types of interest
|
|
|
1393 |
(use the right-click menu inside the preview window), and add what you
|
|
|
1394 |
want to the list of stored fields. A candidate example would be the
|
|
|
1395 |
recipient field which is generated by the message filters.
|
|
|
1396 |
|
1228 |
The default value for the string is:
|
1397 |
The default value for the paragraph format string is:
|
1229 |
|
1398 |
|
1230 |
<img src="%I" align="left">%R %S %L <b>%T</b><br>
|
1399 |
<img src="%I" align="left">%R %S %L <b>%T</b><br>
|
1231 |
%M %D <i>%U</i><br>
|
1400 |
%M %D <i>%U</i> %i<br>
|
1232 |
%A %K
|
1401 |
%A %K
|
1233 |
|
1402 |
|
1234 |
|
1403 |
|
1235 |
You may, for example, try the following for a more web-like
|
1404 |
You may, for example, try the following for a more web-like experience:
|
1236 |
experience:
|
|
|
1237 |
|
1405 |
|
1238 |
<u><b><a href="P%N">%T</a></b></u><br>
|
1406 |
<u><b><a href="P%N">%T</a></b></u><br>
|
1239 |
%A<font color=#008000>%U - %S</font> - %L
|
1407 |
%A<font color=#008000>%U - %S</font> - %L
|
1240 |
|
1408 |
|
1241 |
|
1409 |
|
1242 |
Or the clean looking:
|
1410 |
Or the clean looking:
|
1243 |
|
1411 |
|
1244 |
<img src="%I" align="left">%L <font color="#900000">%R</font>
|
1412 |
<img src="%I" align="left">%L <font color="#900000">%R</font>
|
1245 |
<b>%T</b><br>%S
|
1413 |
<b>%T</b><br>%S
|
1246 |
<font color="#808080"><i>%U</i></font>
|
1414 |
<font color="#808080"><i>%U</i></font>
|
1247 |
<table bgcolor="#e0e0e0">
|
1415 |
<table bgcolor="#e0e0e0">
|
1248 |
<tr><td><div>%A</div></td></tr>
|
1416 |
<tr><td><div>%A</div></td></tr>
|
1249 |
</table>%K
|
1417 |
</table>%K
|
1250 |
|
1418 |
|
1251 |
|
1419 |
|
1252 |
The format of the Preview and Edit links is <a href="Pdocnum"> and <a
|
1420 |
Note that the P%N link in the above paragraph makes the title a preview
|
1253 |
href="Edocnum"> where docnum is what %N would print. This makes the
|
1421 |
link.
|
1254 |
title a preview link in the above format.
|
|
|
1255 |
|
1422 |
|
1256 |
Please note that, due to the way the program handles right mouse
|
1423 |
Due to the way the program handles right mouse clicks in the result list,
|
1257 |
clicks in the result list, if the custom formatting results in
|
1424 |
if the custom formatting results in multiple paragraphs per result, right
|
1258 |
multiple paragraphs per result, right clicks will only work inside the
|
1425 |
clicks will only work inside the first one.
|
1259 |
first one.
|
|
|
1260 |
|
|
|
1261 |
* HTML help browser: this will let you chose your preferred browser
|
|
|
1262 |
which will be started from the Help menu to read the user manual. You
|
|
|
1263 |
can enter a simple name if the command is in your PATH, or browse for
|
|
|
1264 |
a full pathname.
|
|
|
1265 |
|
|
|
1266 |
* Auto-start simple search on white space entry: if this is checked, a
|
|
|
1267 |
search will be executed each time you enter a space in the simple
|
|
|
1268 |
search input field. This lets you look at the result list as you enter
|
|
|
1269 |
new terms. This is off by default, you may like it or not...
|
|
|
1270 |
|
|
|
1271 |
* Start with advanced search dialog open and Start with sort dialog
|
|
|
1272 |
open: If you use these dialogs all the time, checking these entries
|
|
|
1273 |
will get them to open when recoll starts.
|
|
|
1274 |
|
|
|
1275 |
* Use desktop preferences to choose document editor: if this is checked,
|
|
|
1276 |
the xdg-open utility will be used to open files when you click the
|
|
|
1277 |
Edit link in the result list, instead of the application defined in
|
|
|
1278 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
|
|
1279 |
an appropriate application.
|
|
|
1280 |
|
|
|
1281 |
Search parameters:
|
|
|
1282 |
|
|
|
1283 |
* Stemming language: stemming obviously depends on the document's
|
|
|
1284 |
language. This listbox will let you chose among the stemming databases
|
|
|
1285 |
which were built during indexing (this is set in the main
|
|
|
1286 |
configuration file), or later added with recollindex -s (See the
|
|
|
1287 |
recollindex manual). Stemming languages which are dynamically added
|
|
|
1288 |
will be deleted at the next indexing pass unless they are also added
|
|
|
1289 |
in the configuration file.
|
|
|
1290 |
|
|
|
1291 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
|
|
1292 |
document abstracts when displaying the result list. Abstracts are
|
|
|
1293 |
constructed by taking context from the document information, around
|
|
|
1294 |
the search terms. This can slow down result list display significantly
|
|
|
1295 |
for big documents, and you may want to turn it off.
|
|
|
1296 |
|
|
|
1297 |
* Replace abstracts from documents: this decides if we should synthesize
|
|
|
1298 |
and display an abstract in place of an explicit abstract found within
|
|
|
1299 |
the document itself.
|
|
|
1300 |
|
|
|
1301 |
* Synthetic abstract size: adjust to taste...
|
|
|
1302 |
|
|
|
1303 |
* Synthetic abstract context words: how many words should be displayed
|
|
|
1304 |
around each term occurrence.
|
|
|
1305 |
|
|
|
1306 |
External indexes: This panel will let you browse for additional indexes
|
|
|
1307 |
that you may want to search. External indexes are designated by their
|
|
|
1308 |
database directory (ie: /home/someothergui/.recoll/xapiandb,
|
|
|
1309 |
/usr/local/recollglobal/xapiandb).
|
|
|
1310 |
|
|
|
1311 |
Once entered, the indexes will appear in the External indexes list, and
|
|
|
1312 |
you can chose which ones you want to use at any moment by checking or
|
|
|
1313 |
unchecking their entries.
|
|
|
1314 |
|
|
|
1315 |
Your main database (the one the current configuration indexes to), is
|
|
|
1316 |
always implicitly active. If this is not desirable, you can set up your
|
|
|
1317 |
configuration so that it indexes, for example, an empty directory. An
|
|
|
1318 |
alternative indexer may also need to implement a way of purging the index
|
|
|
1319 |
from stale data,
|
|
|
1320 |
|
1426 |
|
1321 |
----------------------------------------------------------------------
|
1427 |
----------------------------------------------------------------------
|
1322 |
|
1428 |
|
1323 |
Chapter 4. Searching with the KDE KIO slave
|
1429 |
Chapter 4. Searching with the KDE KIO slave
|
1324 |
|
1430 |
|
|
... |
|
... |
1411 |
|
1517 |
|
1412 |
recollq 'ilur -nautique mime:text/html'
|
1518 |
recollq 'ilur -nautique mime:text/html'
|
1413 |
Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
|
1519 |
Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
|
1414 |
OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
|
1520 |
OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
|
1415 |
4 results
|
1521 |
4 results
|
1416 |
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes
|
1522 |
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes
|
1417 |
text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
|
1523 |
text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
|
1418 |
text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]...
|
1524 |
text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]...
|
1419 |
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
|
1525 |
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
|
1420 |
|
1526 |
|
1421 |
----------------------------------------------------------------------
|
1527 |
----------------------------------------------------------------------
|
1422 |
|
1528 |
|
1423 |
Chapter 6. Programming interface
|
1529 |
Chapter 6. Programming interface
|
1424 |
|
1530 |
|
|
... |
|
... |
1437 |
|
1543 |
|
1438 |
Recoll filters are executable programs which translate from a specific
|
1544 |
Recoll filters are executable programs which translate from a specific
|
1439 |
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
1545 |
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
1440 |
format, which may be text/plain or text/html.
|
1546 |
format, which may be text/plain or text/html.
|
1441 |
|
1547 |
|
|
|
1548 |
As of Recoll 1.13, there are two kinds of filters:
|
|
|
1549 |
|
|
|
1550 |
* Simple filters (the old ones) run once and exit. They can be bare
|
|
|
1551 |
programs like antiword, or shell-scripts using other programs. They
|
|
|
1552 |
are very simple to write, just having to write the text to the
|
|
|
1553 |
standard output.
|
|
|
1554 |
|
|
|
1555 |
* Multiple filters, new in 1.13, run as long as their master process
|
|
|
1556 |
(ie: recollindex) is active. They can process multiple files (sparing
|
|
|
1557 |
the process startup time which can be very significant), or multiple
|
|
|
1558 |
documents per file (ie: for zip or chm files). They communicate with
|
|
|
1559 |
the indexer through a simple protocol, but are nevertheless a bit more
|
|
|
1560 |
complicated than the older kind. Most of these new filters are written
|
|
|
1561 |
in Python, using a common module to handle the protocol.
|
|
|
1562 |
|
|
|
1563 |
The following will just describe the simple filters, if you are programmer
|
|
|
1564 |
enough to write one of the other kind, it shouldn't be too difficult to
|
|
|
1565 |
make sense of one of the existing modules (ie: rclzip).
|
|
|
1566 |
|
1442 |
Recoll filters are usually shell-scripts, but this is in no way necessary.
|
1567 |
Recoll simple filters are usually shell-scripts, but this is in no way
|
1443 |
These programs are extremely simple and most of the difficulty lies in
|
1568 |
necessary. These programs are extremely simple and most of the difficulty
|
1444 |
extracting the text from the native format, not outputting what is
|
1569 |
lies in extracting the text from the native format, not outputting what is
|
1445 |
expected by Recoll. Happily enough, most document formats already have
|
1570 |
expected by Recoll. Happily enough, most document formats already have
|
1446 |
translators or text extractors which handle the difficult part and can be
|
1571 |
translators or text extractors which handle the difficult part and can be
|
1447 |
called from the filter. In some case the output of the translating program
|
1572 |
called from the filter. In some case the output of the translating program
|
1448 |
is appropriate, and no intermediate shell-script is needed.
|
1573 |
is appropriate, and no intermediate shell-script is needed.
|
1449 |
|
1574 |
|
|
... |
|
... |
1457 |
The association of file types to filters is performed in the mimeconf
|
1582 |
The association of file types to filters is performed in the mimeconf
|
1458 |
file. A sample:
|
1583 |
file. A sample:
|
1459 |
|
1584 |
|
1460 |
[index]
|
1585 |
[index]
|
1461 |
application/msword = exec antiword -t -i 1 -m UTF-8;\
|
1586 |
application/msword = exec antiword -t -i 1 -m UTF-8;\
|
1462 |
mimetype=text/plain;charset=utf-8
|
1587 |
mimetype = text/plain ; charset=utf-8
|
1463 |
|
1588 |
|
1464 |
application/ogg = exec rclogg
|
1589 |
application/ogg = exec rclogg
|
1465 |
|
1590 |
|
1466 |
text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html
|
1591 |
text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html
|
1467 |
|
1592 |
|
|
|
1593 |
application/x-chm = execm rclchm
|
|
|
1594 |
|
1468 |
The fragment specifies that:
|
1595 |
The fragment specifies that:
|
1469 |
|
1596 |
|
1470 |
* application/msword files are processed by executing the antiword
|
1597 |
* application/msword files are processed by executing the antiword
|
1471 |
program, which outputs text/plain encoded in iso-8859-1.
|
1598 |
program, which outputs text/plain encoded in utf-8.
|
1472 |
|
1599 |
|
1473 |
* application/ogg files are processed by the rclogg script, with default
|
1600 |
* application/ogg files are processed by the rclogg script, with default
|
1474 |
output type (text/html, with encoding specified in the header, or
|
1601 |
output type (text/html, with encoding specified in the header, or
|
1475 |
utf-8 by default).
|
1602 |
utf-8 by default).
|
1476 |
|
1603 |
|
1477 |
* text/rtf is processed by unrtf, which outputs text/html. The
|
1604 |
* text/rtf is processed by unrtf, which outputs text/html. The
|
1478 |
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
1605 |
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
1479 |
and not output by unrtf in the HTML header section.
|
1606 |
and not output by unrtf in the HTML header section.
|
|
|
1607 |
|
|
|
1608 |
* application/x-chm is processed by a persistant filter. This is
|
|
|
1609 |
determined by the execm keyword.
|
1480 |
|
1610 |
|
1481 |
The easiest way to write a new filter is probably to start from an
|
1611 |
The easiest way to write a new filter is probably to start from an
|
1482 |
existing one.
|
1612 |
existing one.
|
1483 |
|
1613 |
|
1484 |
Filters which output text/plain text are generally simpler, but they
|
1614 |
Filters which output text/plain text are generally simpler, but they
|
|
... |
|
... |
1550 |
A field becomes indexed by having a prefix defined in the [prefixes]
|
1680 |
A field becomes indexed by having a prefix defined in the [prefixes]
|
1551 |
section of the fields file. See the comments in there for details
|
1681 |
section of the fields file. See the comments in there for details
|
1552 |
|
1682 |
|
1553 |
A field becomes stored by appearing in the [stored] section of the fields
|
1683 |
A field becomes stored by appearing in the [stored] section of the fields
|
1554 |
file.
|
1684 |
file.
|
|
|
1685 |
|
|
|
1686 |
See the comments inside the fields for more details.
|
1555 |
|
1687 |
|
1556 |
----------------------------------------------------------------------
|
1688 |
----------------------------------------------------------------------
|
1557 |
|
1689 |
|
1558 |
6.3. API
|
1690 |
6.3. API
|
1559 |
|
1691 |
|
|
... |
|
... |
1837 |
|
1969 |
|
1838 |
----------------------------------------------------------------------
|
1970 |
----------------------------------------------------------------------
|
1839 |
|
1971 |
|
1840 |
Chapter 7. Installation
|
1972 |
Chapter 7. Installation
|
1841 |
|
1973 |
|
1842 |
7.1. Installing a prebuilt copy
|
1974 |
7.1. Installing a binary copy
|
1843 |
|
1975 |
|
1844 |
Recoll binary packages from the Recoll web site are always linked
|
1976 |
There are three types of binary Recoll installations:
|
1845 |
statically to the Xapian libraries, and have no other dependencies. You
|
1977 |
|
|
|
1978 |
* Through your system normal software distribution framework (ie,
|
|
|
1979 |
Debian/Ubuntu apt, FreeBSD ports, etc.).
|
|
|
1980 |
|
|
|
1981 |
* From a package downloaded from the Recoll web site.
|
|
|
1982 |
|
|
|
1983 |
* From a prebuilt tree downloaded from the Recoll web site.
|
|
|
1984 |
|
|
|
1985 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
|
|
1986 |
will be automatically satisfied, you should not have to worry about them.
|
|
|
1987 |
|
1846 |
will only have to check or install supporting applications for the file
|
1988 |
You will only have to check or install supporting applications for the
|
1847 |
types that you want to index beyond text, HTML and mail files, and maybe
|
1989 |
file types that you want to index beyond those that are natively processed
|
1848 |
have a look at the configuration section (but this may not be necessary
|
1990 |
by Recoll (text, HTML, mail files, and a few others).
|
|
|
1991 |
|
|
|
1992 |
You should also maybe have a look at the configuration section (but this
|
1849 |
for a quick test with default parameters).
|
1993 |
may not be necessary for a quick test with default parameters). Most
|
|
|
1994 |
parameters can be more conveniently set from the GUI interface.
|
1850 |
|
1995 |
|
1851 |
----------------------------------------------------------------------
|
1996 |
----------------------------------------------------------------------
|
1852 |
|
1997 |
|
1853 |
7.1.1. Installing through a package system
|
1998 |
7.1.1. Installing through a package system
|
1854 |
|
1999 |
|
1855 |
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
2000 |
If you use a BSD-type port system or a prebuilt package (DEB, RPM,
|
|
|
2001 |
manually or through the system software configuration utility), just
|
1856 |
just follow the usual procedure for your system.
|
2002 |
follow the usual procedure for your system.
|
1857 |
|
2003 |
|
1858 |
----------------------------------------------------------------------
|
2004 |
----------------------------------------------------------------------
|
1859 |
|
2005 |
|
1860 |
7.1.2. Installing a prebuilt Recoll
|
2006 |
7.1.2. Installing a prebuilt Recoll
|
1861 |
|
2007 |
|
|
... |
|
... |
1874 |
|
2020 |
|
1875 |
7.2. Supporting packages
|
2021 |
7.2. Supporting packages
|
1876 |
|
2022 |
|
1877 |
Recoll uses external applications to index some file types. You need to
|
2023 |
Recoll uses external applications to index some file types. You need to
|
1878 |
install them for the file types that you wish to have indexed (these are
|
2024 |
install them for the file types that you wish to have indexed (these are
|
1879 |
run-time dependencies. None is needed for building Recoll).
|
2025 |
run-time optional dependencies. None is needed for building or running
|
|
|
2026 |
Recoll except for indexing their specific file type).
|
1880 |
|
2027 |
|
1881 |
After an indexing pass, the commands that were found missing can be
|
2028 |
After an indexing pass, the commands that were found missing can be
|
1882 |
displayed from the recoll File menu. The list is stored in the missing
|
2029 |
displayed from the recoll File menu. The list is stored in the missing
|
1883 |
text file inside the configuration directory.
|
2030 |
text file inside the configuration directory.
|
1884 |
|
2031 |
|
|
... |
|
... |
1906 |
|
2053 |
|
1907 |
* dvi: dvips
|
2054 |
* dvi: dvips
|
1908 |
|
2055 |
|
1909 |
* djvu: DjVuLibre
|
2056 |
* djvu: DjVuLibre
|
1910 |
|
2057 |
|
1911 |
* MP3: Recoll will use the id3info command from the id3lib package to
|
2058 |
* mp3: Recoll will use the id3info command from the id3lib package to
|
1912 |
extract tag information. Without it, only the file names will be
|
2059 |
extract tag information. Without it, only the file names will be
|
1913 |
indexed.
|
2060 |
indexed.
|
1914 |
|
2061 |
|
|
|
2062 |
* flac files need metaflac.
|
|
|
2063 |
|
|
|
2064 |
* ogg files need ogginfo.
|
|
|
2065 |
|
1915 |
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
2066 |
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
1916 |
information. Most image file formats are supported.
|
2067 |
information. Most image file formats are supported. Note that there
|
|
|
2068 |
may not be much interest in indexing the technical tags (image size,
|
|
|
2069 |
aperture, etc.). This is only of interest if you store personal tags
|
|
|
2070 |
or textual descriptions inside the image files.
|
1917 |
|
2071 |
|
|
|
2072 |
* chm: files in microsoft help format need Python and the pychm module
|
|
|
2073 |
(which needs chmlib).
|
|
|
2074 |
|
|
|
2075 |
* ics: iCalendar files need Python and the icalendar module.
|
|
|
2076 |
|
|
|
2077 |
* zip: Zip archives need Python (and the standard zipfile module).
|
|
|
2078 |
|
1918 |
Text, HTML, mail folders Openoffice and Scribus files are processed
|
2079 |
Text, HTML, mail folders, Openoffice and Scribus files are processed
|
1919 |
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
2080 |
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
1920 |
|
2081 |
|
1921 |
----------------------------------------------------------------------
|
2082 |
----------------------------------------------------------------------
|
1922 |
|
2083 |
|
1923 |
7.3. Building from source
|
2084 |
7.3. Building from source
|
1924 |
|
2085 |
|
1925 |
7.3.1. Prerequisites
|
2086 |
7.3.1. Prerequisites
|
1926 |
|
2087 |
|
1927 |
At the very least, you will need to download and install the xapian core
|
2088 |
At the very least, you will need to download and install the xapian core
|
1928 |
package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
|
2089 |
package and the qt run-time and development packages. Check the Recoll
|
1929 |
version will work too), and the qt run-time and development packages
|
2090 |
download page for up to date version information.
|
1930 |
(Recoll development currently uses version 3.3.5, but any 3.3 version is
|
|
|
1931 |
probably OK).
|
|
|
1932 |
|
2091 |
|
1933 |
You will most probably be able to find a binary package for qt for your
|
2092 |
You will most probably be able to find a binary package for qt for your
|
1934 |
system. You may have to compile Xapian but this is not difficult (if you
|
2093 |
system. You may have to compile Xapian but this is not difficult (if you
|
1935 |
are using FreeBSD, there is a port).
|
2094 |
are using FreeBSD, there is a port).
|
1936 |
|
2095 |
|
|
... |
|
... |
1940 |
|
2099 |
|
1941 |
----------------------------------------------------------------------
|
2100 |
----------------------------------------------------------------------
|
1942 |
|
2101 |
|
1943 |
7.3.2. Building
|
2102 |
7.3.2. Building
|
1944 |
|
2103 |
|
1945 |
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
2104 |
Recoll has been built on Linux, FreeBSD, macosx, and Solaris, most
|
1946 |
3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
|
2105 |
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
1947 |
system, and need to modify things, I would very much welcome patches.
|
2106 |
ok). If you build on another system, and need to modify things, I would
|
|
|
2107 |
very much welcome patches.
|
1948 |
|
2108 |
|
1949 |
Depending on the qt configuration on your system, you may have to set the
|
2109 |
Depending on the qt configuration on your system, you may have to set the
|
1950 |
QTDIR and QMAKESPECS variables in your environment:
|
2110 |
QTDIR and QMAKESPECS variables in your environment:
|
1951 |
|
2111 |
|
1952 |
* QTDIR should point to the directory above the one that holds the qt
|
2112 |
* QTDIR should point to the directory above the one that holds the qt
|
|
... |
|
... |
1955 |
|
2115 |
|
1956 |
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
2116 |
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
1957 |
sub-directories (ie: linux-g++).
|
2117 |
sub-directories (ie: linux-g++).
|
1958 |
|
2118 |
|
1959 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
2119 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
1960 |
is not needed because there is a default link in mkspecs/.
|
2120 |
is not needed because there is a default link in mkspecs/. Neither should
|
|
|
2121 |
be needed with Qt 4.
|
1961 |
|
2122 |
|
1962 |
Configure options: --without-aspell will disable the code for phonetic
|
2123 |
Configure options:
|
1963 |
matching of search terms. --with-fam or --with-inotify will enable the
|
2124 |
|
|
|
2125 |
* --without-aspell will disable the code for phonetic matching of search
|
|
|
2126 |
terms.
|
|
|
2127 |
|
|
|
2128 |
* --with-fam or --with-inotify will enable the code for real time
|
1964 |
code for real time indexing. Inotify support is enabled by default on
|
2129 |
indexing. Inotify support is enabled by default on recent Linux
|
1965 |
recent Linux systems.
|
2130 |
systems.
|
|
|
2131 |
|
|
|
2132 |
* --enable-xattr will enable code to fetch data from file extended
|
|
|
2133 |
attributes. This is only useful is some application stores data in
|
|
|
2134 |
there, and also needs some simple configuration (see comments in the
|
|
|
2135 |
fields configuration file).
|
|
|
2136 |
|
|
|
2137 |
* --with-file-command Specify the version of the 'file' command to use
|
|
|
2138 |
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
|
|
2139 |
the gnu version on systems where the native one is bad.
|
|
|
2140 |
|
|
|
2141 |
* --without-gui Disable the Qt interface, and auxiliary uses of X11, and
|
|
|
2142 |
compile the command line version.
|
1966 |
|
2143 |
|
1967 |
Normal procedure:
|
2144 |
Normal procedure:
|
1968 |
|
2145 |
|
1969 |
cd recoll-xxx
|
2146 |
cd recoll-xxx
|
1970 |
configure
|
2147 |
configure
|
1971 |
make
|
2148 |
make
|
1972 |
(practices usual hardship-repelling invocations)
|
2149 |
(practices usual hardship-repelling invocations)
|
1973 |
|
2150 |
|
1974 |
|
2151 |
|
1975 |
There little auto-configuration. The configure script will mainly link one
|
2152 |
There is little auto-configuration. The configure script will mainly link
|
1976 |
of the system-specific files in the mk directory to mk/sysconf. If your
|
2153 |
one of the system-specific files in the mk directory to mk/sysconf. If
|
1977 |
system is not known yet, it will tell you as much, and you may want to
|
2154 |
your system is not known yet, it will tell you as much, and you may want
|
1978 |
manually copy and modify one of the existing files (the new file name
|
2155 |
to manually copy and modify one of the existing files (the new file name
|
1979 |
should be the output of uname -s).
|
2156 |
should be the output of uname -s).
|
1980 |
|
2157 |
|
1981 |
----------------------------------------------------------------------
|
2158 |
----------------------------------------------------------------------
|
1982 |
|
2159 |
|
1983 |
7.3.3. Installation
|
2160 |
7.3.3. Installation
|
|
... |
|
... |
2077 |
The default configuration will index your home directory. If this is not
|
2254 |
The default configuration will index your home directory. If this is not
|
2078 |
appropriate, start recoll to create a blank configuration, click Cancel,
|
2255 |
appropriate, start recoll to create a blank configuration, click Cancel,
|
2079 |
and edit the configuration file before restarting the command. This will
|
2256 |
and edit the configuration file before restarting the command. This will
|
2080 |
start the initial indexing, which may take some time.
|
2257 |
start the initial indexing, which may take some time.
|
2081 |
|
2258 |
|
2082 |
Paramers:
|
2259 |
Paramers affecting what we index:
|
2083 |
|
2260 |
|
2084 |
topdirs
|
2261 |
topdirs
|
2085 |
|
2262 |
|
2086 |
Specifies the list of directories or files to index (recursively
|
2263 |
Specifies the list of directories or files to index (recursively
|
2087 |
for directories). The indexer will not follow symbolic links
|
2264 |
for directories). The indexer will not follow symbolic links
|
2088 |
inside the indexed trees by default (see the followLinks options
|
2265 |
inside the indexed trees by default (see the followLinks options
|
2089 |
though).
|
2266 |
though).
|
2090 |
|
2267 |
|
2091 |
dbdir
|
|
|
2092 |
|
|
|
2093 |
The name of the Xapian data directory. It will be created if
|
|
|
2094 |
needed when the index is initialized. If this is not an absolute
|
|
|
2095 |
path, it will be interpreted relative to the configuration
|
|
|
2096 |
directory. The value can have embedded spaces but starting or
|
|
|
2097 |
trailing spaces will be trimmed. You cannot use quotes here.
|
|
|
2098 |
|
|
|
2099 |
skippedNames
|
2268 |
skippedNames
|
2100 |
|
2269 |
|
2101 |
A space-separated list of patterns for names of files or
|
2270 |
A space-separated list of patterns for names of files or
|
2102 |
directories that should be completely ignored. The list defined in
|
2271 |
directories that should be completely ignored. The list defined in
|
2103 |
the default file is:
|
2272 |
the default file is:
|
2104 |
|
2273 |
|
2105 |
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
2274 |
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
2106 |
*~ recollrc
|
2275 |
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \
|
|
|
2276 |
.recoll* xapiandb recollrc recoll.conf
|
2107 |
|
2277 |
|
2108 |
The list can be redefined for sub-directories, but is only
|
2278 |
The list can be redefined at any sub-directory in the indexed
|
2109 |
actually changed for the top level ones in topdirs.
|
2279 |
area.
|
2110 |
|
2280 |
|
2111 |
The top-level directories are not affected by this list (that is,
|
2281 |
The top-level directories are not affected by this list (that is,
|
2112 |
a directory in topdirs might match and would still be indexed).
|
2282 |
a directory in topdirs might match and would still be indexed).
|
2113 |
|
2283 |
|
2114 |
The list in the default configuration does not exclude hidden
|
2284 |
The list in the default configuration does not exclude hidden
|
|
... |
|
... |
2147 |
avoid multiple indexing of linked files. No effort is made to
|
2317 |
avoid multiple indexing of linked files. No effort is made to
|
2148 |
avoid duplication when this option is set to true. This option can
|
2318 |
avoid duplication when this option is set to true. This option can
|
2149 |
be set individually for each of the topdirs members by using
|
2319 |
be set individually for each of the topdirs members by using
|
2150 |
sections. It can not be changed below the topdirs level.
|
2320 |
sections. It can not be changed below the topdirs level.
|
2151 |
|
2321 |
|
|
|
2322 |
indexedmimetypes
|
|
|
2323 |
|
|
|
2324 |
Recoll normally indexes any file which it knows how to read. This
|
|
|
2325 |
list lets you restrict the indexed mime types to what you specify.
|
|
|
2326 |
If the variable is unspecified or the list empty (the default),
|
|
|
2327 |
all supported types are processed.
|
|
|
2328 |
|
|
|
2329 |
compressedfilemaxkbs
|
|
|
2330 |
|
|
|
2331 |
Size limit for compressed (.gz or .bz2) files. These need to be
|
|
|
2332 |
decompressed in a temporary directory for identification, which
|
|
|
2333 |
can be very wasteful if 'uninteresting' big compressed files are
|
|
|
2334 |
present. Negative means no limit, 0 means no processing of any
|
|
|
2335 |
compressed file. Defaults to -1.
|
|
|
2336 |
|
|
|
2337 |
textfilemaxmbs
|
|
|
2338 |
|
|
|
2339 |
Maximum size for text files. Very big text files are often
|
|
|
2340 |
uninteresting logs. Set to -1 to disable (default 20MB).
|
|
|
2341 |
|
|
|
2342 |
textfilepagekbs
|
|
|
2343 |
|
|
|
2344 |
If set to other than -1, text files will be indexed as multiple
|
|
|
2345 |
documents of the given page size. This may be useful if you do
|
|
|
2346 |
want to index very big text files as it will both reduce memory
|
|
|
2347 |
usage at index time and help with loading data to the preview
|
|
|
2348 |
window. A size of a few megabytes would seem reasonable (default:
|
|
|
2349 |
1MB).
|
|
|
2350 |
|
|
|
2351 |
indexallfilenames
|
|
|
2352 |
|
|
|
2353 |
Recoll indexes file names in a special section of the database to
|
|
|
2354 |
allow specific file names searches using wild cards. This
|
|
|
2355 |
parameter decides if file name indexing is performed only for
|
|
|
2356 |
files with mime types that would qualify them for full text
|
|
|
2357 |
indexing, or for all files inside the selected subtrees,
|
|
|
2358 |
independently of mime type.
|
|
|
2359 |
|
|
|
2360 |
usesystemfilecommand
|
|
|
2361 |
|
|
|
2362 |
Decide if we use the file -i system command as a final step for
|
|
|
2363 |
determining the mime type for a file (the main procedure uses
|
|
|
2364 |
suffix associations as defined in the mimemap file). This can be
|
|
|
2365 |
useful for files with suffix-less names, but it will also cause
|
|
|
2366 |
the indexing of many bogus "text" files.
|
|
|
2367 |
|
|
|
2368 |
processbeaglequeue
|
|
|
2369 |
|
|
|
2370 |
If this is set, process the directory where Beagle Web browser
|
|
|
2371 |
plugins copy visited pages for indexing. Of course, Beagle MUST
|
|
|
2372 |
NOT be running, else things will behave strangely.
|
|
|
2373 |
|
|
|
2374 |
beaglequeuedir
|
|
|
2375 |
|
|
|
2376 |
The path to the Beagle indexing queue. This is hard-coded in the
|
|
|
2377 |
Beagle plugin as ~/.beagle/ToIndex so there should be no need to
|
|
|
2378 |
change it.
|
|
|
2379 |
|
|
|
2380 |
Parameters affecting where and how we store things:
|
|
|
2381 |
|
|
|
2382 |
dbdir
|
|
|
2383 |
|
|
|
2384 |
The name of the Xapian data directory. It will be created if
|
|
|
2385 |
needed when the index is initialized. If this is not an absolute
|
|
|
2386 |
path, it will be interpreted relative to the configuration
|
|
|
2387 |
directory. The value can have embedded spaces but starting or
|
|
|
2388 |
trailing spaces will be trimmed. You cannot use quotes here.
|
|
|
2389 |
|
|
|
2390 |
maxfsoccuppc
|
|
|
2391 |
|
|
|
2392 |
Maximum file system occupation before we stop indexing. The value
|
|
|
2393 |
is a percentage, corresponding to what the "Capacity" df output
|
|
|
2394 |
column shows. The default value is 0, meaning no checking.
|
|
|
2395 |
|
|
|
2396 |
mboxcachedir
|
|
|
2397 |
|
|
|
2398 |
The directory where mbox message offsets cache files are held.
|
|
|
2399 |
This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
|
|
|
2400 |
to share a directory between different configurations.
|
|
|
2401 |
|
|
|
2402 |
mboxcacheminmbs
|
|
|
2403 |
|
|
|
2404 |
The minimum mbox file size over which we cache the offsets. There
|
|
|
2405 |
is really no sense in caching offsets for small files. The default
|
|
|
2406 |
is 5 MB.
|
|
|
2407 |
|
|
|
2408 |
webcachedir
|
|
|
2409 |
|
|
|
2410 |
This is only used by the Beagle web browser plugin indexing code,
|
|
|
2411 |
and defines where the cache for visited pages will live. Default:
|
|
|
2412 |
$RECOLL_CONFDIR/webcache
|
|
|
2413 |
|
|
|
2414 |
webcachemaxmbs
|
|
|
2415 |
|
|
|
2416 |
This is only used by the Beagle web browser plugin indexing code,
|
|
|
2417 |
and defines the maximum size for the web page cache. Default: 40
|
|
|
2418 |
MB.
|
|
|
2419 |
|
|
|
2420 |
idxflushmb
|
|
|
2421 |
|
|
|
2422 |
Threshold (megabytes of new text data) where we flush from memory
|
|
|
2423 |
to disk index. Setting this can help control memory usage. A value
|
|
|
2424 |
of 0 means no explicit flushing, letting Xapian use its own
|
|
|
2425 |
default, which is flushing every 10000 documents (memory usage
|
|
|
2426 |
depends on average document size). The default value is 10.
|
|
|
2427 |
|
|
|
2428 |
Miscellani:
|
|
|
2429 |
|
2152 |
loglevel,daemloglevel
|
2430 |
loglevel,daemloglevel
|
2153 |
|
2431 |
|
2154 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
2432 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
2155 |
quite a lot of debug/information messages. 2 only lists errors.
|
2433 |
quite a lot of debug/information messages. 2 only lists errors.
|
2156 |
The daemversion is specific to the indexing monitor daemon.
|
2434 |
The daemversion is specific to the indexing monitor daemon.
|
|
... |
|
... |
2176 |
character set definition (ie: plain text files). This can be
|
2454 |
character set definition (ie: plain text files). This can be
|
2177 |
redefined for any sub-directory. If it is not set at all, the
|
2455 |
redefined for any sub-directory. If it is not set at all, the
|
2178 |
character set used is the one defined by the nls environment
|
2456 |
character set used is the one defined by the nls environment
|
2179 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
2457 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
2180 |
|
2458 |
|
2181 |
maxfsoccuppc
|
2459 |
filtermaxseconds
|
2182 |
|
2460 |
|
2183 |
Maximum file system occupation before we stop indexing. The value
|
2461 |
Maximum filter execution time, after which it is aborted. Some
|
2184 |
is a percentage, corresponding to what the "Capacity" df output
|
2462 |
postscript programs just loop...
|
2185 |
column shows. The default value is 0, meaning no checking.
|
|
|
2186 |
|
2463 |
|
2187 |
idxflushmb
|
2464 |
maildefcharset
|
2188 |
|
2465 |
|
2189 |
Threshold (megabytes of new text data) where we flush from memory
|
2466 |
This can be used to define the default character set specifically
|
2190 |
to disk index. Setting this can help control memory usage. A value
|
2467 |
for mail messages which don't specify it. This is mainly useful
|
2191 |
of 0 means no explicit flushing, letting Xapian use its own
|
2468 |
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
2192 |
default, which is flushing every 10000 documents (memory usage
|
2469 |
|
2193 |
depends on average document size). The default value is 10.
|
2470 |
localfields
|
|
|
2471 |
|
|
|
2472 |
This allows setting fields for all documents under a given
|
|
|
2473 |
directory. Typical usage would be to set an "rclaptg" field, to be
|
|
|
2474 |
used in mimeview to select a specific viewer. Ie:
|
|
|
2475 |
localfields=rclaptg=gnus;other=val, then select specifier viewer
|
|
|
2476 |
with mimetype|tag=... in mimeview.
|
2194 |
|
2477 |
|
2195 |
filtersdir
|
2478 |
filtersdir
|
2196 |
|
2479 |
|
2197 |
A directory to search for the external filter scripts used to
|
2480 |
A directory to search for the external filter scripts used to
|
2198 |
index some types of files. The value should not be changed, except
|
2481 |
index some types of files. The value should not be changed, except
|
|
... |
|
... |
2201 |
|
2484 |
|
2202 |
iconsdir
|
2485 |
iconsdir
|
2203 |
|
2486 |
|
2204 |
The name of the directory where recoll result list icons are
|
2487 |
The name of the directory where recoll result list icons are
|
2205 |
stored. You can change this if you want different images.
|
2488 |
stored. You can change this if you want different images.
|
2206 |
|
|
|
2207 |
guesscharset
|
|
|
2208 |
|
|
|
2209 |
Decide if we try to guess the character set of files if no
|
|
|
2210 |
internal value is available (ie: for plain text files). This does
|
|
|
2211 |
not work well in general, and should probably not be used.
|
|
|
2212 |
|
|
|
2213 |
usesystemfilecommand
|
|
|
2214 |
|
|
|
2215 |
Decide if we use the file -i system command as a final step for
|
|
|
2216 |
determining the mime type for a file (the main procedure uses
|
|
|
2217 |
suffix associations as defined in the mimemap file). This can be
|
|
|
2218 |
useful for files with suffix-less names, but it will also cause
|
|
|
2219 |
the indexing of many bogus "text" files.
|
|
|
2220 |
|
|
|
2221 |
indexedmimetypes
|
|
|
2222 |
|
|
|
2223 |
Recoll normally indexes any file which it knows how to read. This
|
|
|
2224 |
list lets you restrict the indexed mime types to what you specify.
|
|
|
2225 |
If the variable is unspecified or the list empty (the default),
|
|
|
2226 |
all supported types are processed.
|
|
|
2227 |
|
|
|
2228 |
compressedfilemaxkbs
|
|
|
2229 |
|
|
|
2230 |
Size limit for compressed (.gz or .bz2) files. These need to be
|
|
|
2231 |
decompressed in a temporary directory for identification, which
|
|
|
2232 |
can be very wasteful if 'uninteresting' big compressed files are
|
|
|
2233 |
present. Negative means no limit, 0 means no processing of any
|
|
|
2234 |
compressed file. Defaults to -1.
|
|
|
2235 |
|
|
|
2236 |
indexallfilenames
|
|
|
2237 |
|
|
|
2238 |
Recoll indexes file names in a special section of the database to
|
|
|
2239 |
allow specific file names searches using wild cards. This
|
|
|
2240 |
parameter decides if file name indexing is performed only for
|
|
|
2241 |
files with mime types that would qualify them for full text
|
|
|
2242 |
indexing, or for all files inside the selected subtrees,
|
|
|
2243 |
independently of mime type.
|
|
|
2244 |
|
2489 |
|
2245 |
idxabsmlen
|
2490 |
idxabsmlen
|
2246 |
|
2491 |
|
2247 |
Recoll stores an abstract for each indexed file inside the
|
2492 |
Recoll stores an abstract for each indexed file inside the
|
2248 |
database. The text can come from an actual 'abstract' section in
|
2493 |
database. The text can come from an actual 'abstract' section in
|
|
... |
|
... |
2282 |
This lets you adjust the size of n-grams used for indexing CJK
|
2527 |
This lets you adjust the size of n-grams used for indexing CJK
|
2283 |
text. The default value of 2 is probably appropriate in most
|
2528 |
text. The default value of 2 is probably appropriate in most
|
2284 |
cases. A value of 3 would allow more precision and efficiency on
|
2529 |
cases. A value of 3 would allow more precision and efficiency on
|
2285 |
longer words, but the index will be approximately twice as large.
|
2530 |
longer words, but the index will be approximately twice as large.
|
2286 |
|
2531 |
|
|
|
2532 |
guesscharset
|
|
|
2533 |
|
|
|
2534 |
Decide if we try to guess the character set of files if no
|
|
|
2535 |
internal value is available (ie: for plain text files). This does
|
|
|
2536 |
not work well in general, and should probably not be used.
|
|
|
2537 |
|
2287 |
----------------------------------------------------------------------
|
2538 |
----------------------------------------------------------------------
|
2288 |
|
2539 |
|
2289 |
7.4.2. The mimemap file
|
2540 |
7.4.2. The mimemap file
|
2290 |
|
2541 |
|
2291 |
mimemap specifies the file name extension to mime type mappings.
|
2542 |
mimemap specifies the file name extension to mime type mappings.
|
|
... |
|
... |
2341 |
non-default entries, which will override those from the central
|
2592 |
non-default entries, which will override those from the central
|
2342 |
configuration file.
|
2593 |
configuration file.
|
2343 |
|
2594 |
|
2344 |
Please note that these entries must be placed under a [view] section.
|
2595 |
Please note that these entries must be placed under a [view] section.
|
2345 |
|
2596 |
|
|
|
2597 |
The keys in the file are normally mime types. You can add an application
|
|
|
2598 |
tag to specialize the choice for an area of the filesystem (using a
|
|
|
2599 |
localfields specification in mimeconf). The syntax for the key is
|
|
|
2600 |
mimetype|tag
|
|
|
2601 |
|
2346 |
If Use desktop preferences to choose document editor is checked in the
|
2602 |
If Use desktop preferences to choose document editor is checked in the
|
2347 |
user preferences, all mimeview entries will be ignored except the one
|
2603 |
user preferences, all mimeview entries will be ignored except the one
|
2348 |
labelled application/x-all (which is set to use xdg-open by default).
|
2604 |
labelled application/x-all (which is set to use xdg-open by default).
|
2349 |
|
2605 |
|
2350 |
----------------------------------------------------------------------
|
2606 |
----------------------------------------------------------------------
|