|
a/src/README |
|
b/src/README |
|
... |
|
... |
30 |
|
30 |
|
31 |
2. Indexing
|
31 |
2. Indexing
|
32 |
|
32 |
|
33 |
2.1. Introduction
|
33 |
2.1. Introduction
|
34 |
|
34 |
|
|
|
35 |
2.1.1. Indexing modes
|
|
|
36 |
|
|
|
37 |
2.1.2. Configurations, multiple indexes
|
|
|
38 |
|
|
|
39 |
2.1.3. Document types
|
|
|
40 |
|
|
|
41 |
2.1.4. Recovery
|
|
|
42 |
|
35 |
2.2. Index storage
|
43 |
2.2. Index storage
|
36 |
|
44 |
|
37 |
2.2.1. Xapian index formats
|
45 |
2.2.1. Xapian index formats
|
38 |
|
46 |
|
39 |
2.2.2. Security aspects
|
47 |
2.2.2. Security aspects
|
|
... |
|
... |
103 |
3.6. Desktop integration
|
111 |
3.6. Desktop integration
|
104 |
|
112 |
|
105 |
3.6.1. Hotkeying recoll
|
113 |
3.6.1. Hotkeying recoll
|
106 |
|
114 |
|
107 |
3.6.2. The KDE Kicker Recoll applet
|
115 |
3.6.2. The KDE Kicker Recoll applet
|
|
|
116 |
|
|
|
117 |
3.7. Multiple databases
|
108 |
|
118 |
|
109 |
4. Programming interface
|
119 |
4. Programming interface
|
110 |
|
120 |
|
111 |
4.1. Writing a document filter
|
121 |
4.1. Writing a document filter
|
112 |
|
122 |
|
|
... |
|
... |
286 |
Indexing is the process by which the set of documents is analyzed and the
|
296 |
Indexing is the process by which the set of documents is analyzed and the
|
287 |
data entered into the database. Recoll indexing is normally incremental:
|
297 |
data entered into the database. Recoll indexing is normally incremental:
|
288 |
documents will only be processed if they have been modified. On the first
|
298 |
documents will only be processed if they have been modified. On the first
|
289 |
execution, all documents will need processing. A full index build can be
|
299 |
execution, all documents will need processing. A full index build can be
|
290 |
forced later by specifying an option to the indexing command (recollindex
|
300 |
forced later by specifying an option to the indexing command (recollindex
|
291 |
-z).
|
301 |
-z or -Z).
|
292 |
|
302 |
|
|
|
303 |
The following sections give an overview of different aspects of the
|
|
|
304 |
indexing processes and configuration, with links to detailed sections.
|
|
|
305 |
|
|
|
306 |
----------------------------------------------------------------------
|
|
|
307 |
|
|
|
308 |
2.1.1. Indexing modes
|
|
|
309 |
|
293 |
Recoll indexing can be performed with two different methods:
|
310 |
Recoll indexing can be performed along two different modes:
|
294 |
|
311 |
|
295 |
* Periodic (or Batch) indexing: indexing takes place at discrete times,
|
312 |
* Periodic (or batch) indexing: indexing takes place at discrete times,
|
296 |
by executing the recollindex command. The typical usage is to have a
|
313 |
by executing the recollindex command. The typical usage is to have a
|
297 |
nightly indexing run programmed into your cron file.
|
314 |
nightly indexing run programmed into your cron file.
|
298 |
|
315 |
|
299 |
* Real time indexing: indexing takes place as soon as a file is created
|
316 |
* Real time indexing: indexing takes place as soon as a file is created
|
300 |
or changed. recollindex runs as a daemon and uses a file system
|
317 |
or changed. recollindex runs as a daemon and uses a file system
|
|
... |
|
... |
305 |
they can be combined by setting up multiple indexes (ie: use periodic
|
322 |
they can be combined by setting up multiple indexes (ie: use periodic
|
306 |
indexing on a big documentation directory, and real time indexing on a
|
323 |
indexing on a big documentation directory, and real time indexing on a
|
307 |
small home directory). Monitoring a big file system tree can consume
|
324 |
small home directory). Monitoring a big file system tree can consume
|
308 |
significant system resources.
|
325 |
significant system resources.
|
309 |
|
326 |
|
|
|
327 |
----------------------------------------------------------------------
|
|
|
328 |
|
|
|
329 |
2.1.2. Configurations, multiple indexes
|
|
|
330 |
|
|
|
331 |
The parameters describing what is to be indexed and local preferences are
|
|
|
332 |
defined in text files contained in a configuration directory.
|
|
|
333 |
|
|
|
334 |
All parameters have defaults, defined in system-wide files.
|
|
|
335 |
|
|
|
336 |
Without further configuration, Recoll will index all appropriate files
|
|
|
337 |
from your home directory, with a reasonable set of defaults.
|
|
|
338 |
|
|
|
339 |
A default personal configuration directory ($HOME/.recoll/) is created
|
|
|
340 |
when a Recoll program is first executed. It is possible to create other
|
|
|
341 |
configuration directories, and use them by setting the RECOLL_CONFDIR
|
|
|
342 |
environment variable, or giving the -c option to any of the Recoll
|
|
|
343 |
commands.
|
|
|
344 |
|
|
|
345 |
In some cases, it may be interesting to index different areas of the file
|
|
|
346 |
system to separate databases. You can do this by using multiple
|
|
|
347 |
configuration directories, each indexing a file system area to a specific
|
|
|
348 |
database. Typically, this would be done to separate personal and shared
|
|
|
349 |
indexes, or to take advantage of the organization of your data to improve
|
|
|
350 |
search precision.
|
|
|
351 |
|
|
|
352 |
The generated indexes can be queried concurrently in a transparent manner.
|
|
|
353 |
|
|
|
354 |
For index generation, multiple configurations are totally independant from
|
|
|
355 |
each other. When multiple indexes are used for searches, some parameters
|
|
|
356 |
should be consistent among the configurations.
|
|
|
357 |
|
|
|
358 |
----------------------------------------------------------------------
|
|
|
359 |
|
|
|
360 |
2.1.3. Document types
|
|
|
361 |
|
310 |
Recoll knows about quite a few different document types. The parameters
|
362 |
Recoll knows about quite a few different document types. The parameters
|
311 |
for document types recognition and processing are set in configuration
|
363 |
for document types recognition and processing are set in configuration
|
312 |
files.
|
364 |
files.
|
313 |
|
365 |
|
314 |
Most file types, like HTML or word processing files, only hold one
|
366 |
Most file types, like HTML or word processing files, only hold one
|
315 |
document. Some file types, like email folders or zip archives, can hold
|
367 |
document. Some file types, like email folders or zip archives, can hold
|
316 |
many individually indexed documents, which may in turn be themselves
|
368 |
many individually indexed documents, which may themselves be compound
|
317 |
compound ones. Such hierarchies can go quite deep, and Recoll can process,
|
369 |
ones. Such hierarchies can go quite deep, and Recoll can process, for
|
318 |
for example, an ms-word document stored as an attachment to an email
|
370 |
example, an ms-word document stored as an attachment to an email message
|
319 |
message inside an email folder archived in a zip file...
|
371 |
inside an email folder archived in a zip file...
|
320 |
|
372 |
|
321 |
Recoll indexing processes plain text, HTML, OpenDocument
|
373 |
Recoll indexing processes plain text, HTML, OpenDocument
|
322 |
(Open/LibreOffice), email formats, and a few others internally.
|
374 |
(Open/LibreOffice), email formats, and a few others internally.
|
323 |
|
375 |
|
324 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
376 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
|
... |
|
... |
327 |
would be needed for indexing existing files types. This list can be
|
379 |
would be needed for indexing existing files types. This list can be
|
328 |
displayed by selecting the menu option File->Show Missing Helpers in the
|
380 |
displayed by selecting the menu option File->Show Missing Helpers in the
|
329 |
recoll GUI. It is stored in the missing text file inside the configuration
|
381 |
recoll GUI. It is stored in the missing text file inside the configuration
|
330 |
directory.
|
382 |
directory.
|
331 |
|
383 |
|
332 |
Without further configuration, Recoll will index all appropriate files
|
384 |
----------------------------------------------------------------------
|
333 |
from your home directory, with a reasonable set of defaults.
|
|
|
334 |
|
385 |
|
335 |
In some cases, it may be interesting to index different areas of the file
|
386 |
2.1.4. Recovery
|
336 |
system to separate databases. You can do this by using multiple
|
|
|
337 |
configuration directories, each indexing a file system area to a specific
|
|
|
338 |
database. See the section about using multiple databases for more
|
|
|
339 |
information on multiple configurations and indexes.
|
|
|
340 |
|
387 |
|
341 |
In the rare case where the index becomes corrupted (which can signal
|
388 |
In the rare case where the index becomes corrupted (which can signal
|
342 |
itself by weird search results or crashes), the index files need to be
|
389 |
itself by weird search results or crashes), the index files need to be
|
343 |
erased before restarting a clean indexing pass. Just delete the xapiandb
|
390 |
erased before restarting a clean indexing pass. Just delete the xapiandb
|
344 |
directory (see next section), or, alternatively, start the next
|
391 |
directory (see next section), or, alternatively, start the next
|
|
... |
|
... |
377 |
configuration section). This method would mainly be of use if you
|
424 |
configuration section). This method would mainly be of use if you
|
378 |
wanted to keep the configuration directory in its default location,
|
425 |
wanted to keep the configuration directory in its default location,
|
379 |
but desired another location for the index, typically out of disk
|
426 |
but desired another location for the index, typically out of disk
|
380 |
occupation concerns.
|
427 |
occupation concerns.
|
381 |
|
428 |
|
382 |
The size of the index is determined by the document set size, but the
|
429 |
The size of the index is determined by the size of the set of documents,
|
383 |
ratio can vary a lot. For a typical mixed set of documents, the index size
|
430 |
but the ratio can vary a lot. For a typical mixed set of documents, the
|
384 |
will often be close to the data set size. In specific cases (a set of
|
431 |
index size will often be close to the data set size. In specific cases (a
|
385 |
compressed mbox files for example), the index can become much bigger than
|
432 |
set of compressed mbox files for example), the index can become much
|
386 |
the documents. It may also be much smaller if the documents contain a lot
|
433 |
bigger than the documents. It may also be much smaller if the documents
|
387 |
of images or other non-indexed data (an extreme example being a set of mp3
|
434 |
contain a lot of images or other non-indexed data (an extreme example
|
388 |
files where only the tags would be indexed).
|
435 |
being a set of mp3 files where only the tags would be indexed).
|
389 |
|
436 |
|
390 |
Of course, images, sound and video do not increase the index size, which
|
437 |
Of course, images, sound and video do not increase the index size, which
|
391 |
means that nowadays (2012), typically, even a big index will be negligible
|
438 |
means that nowadays (2012), typically, even a big index will be negligible
|
392 |
against the total amount of data on the computer.
|
439 |
against the total amount of data on the computer.
|
393 |
|
440 |
|
|
... |
|
... |
407 |
format to the newer one. If you want to upgrade to the new format, or if a
|
454 |
format to the newer one. If you want to upgrade to the new format, or if a
|
408 |
very old index needs to be converted because its format is not supported
|
455 |
very old index needs to be converted because its format is not supported
|
409 |
any more, you will have to explicitly delete the old index, then run a
|
456 |
any more, you will have to explicitly delete the old index, then run a
|
410 |
normal indexing process.
|
457 |
normal indexing process.
|
411 |
|
458 |
|
412 |
Unfortunately, using the -z option to recollindex is not sufficient to
|
459 |
Using the -z option to recollindex is not sufficient to change the format,
|
413 |
change the format, you will have to delete all files inside the index
|
460 |
you will have to delete all files inside the index directory (typically
|
414 |
directory (typically ~/.recoll/xapiandb) before starting the indexing.
|
461 |
~/.recoll/xapiandb) before starting the indexing.
|
415 |
|
462 |
|
416 |
----------------------------------------------------------------------
|
463 |
----------------------------------------------------------------------
|
417 |
|
464 |
|
418 |
2.2.2. Security aspects
|
465 |
2.2.2. Security aspects
|
419 |
|
466 |
|
|
... |
|
... |
437 |
|
484 |
|
438 |
Variables set inside the Recoll configuration files control which areas of
|
485 |
Variables set inside the Recoll configuration files control which areas of
|
439 |
the file system are indexed, and how files are processed. These variables
|
486 |
the file system are indexed, and how files are processed. These variables
|
440 |
can be set either by editing the text files or using the dialogs in the
|
487 |
can be set either by editing the text files or using the dialogs in the
|
441 |
recoll GUI.
|
488 |
recoll GUI.
|
442 |
|
|
|
443 |
You can also use multiple indexes defined by separate configurations,
|
|
|
444 |
typically to separate personal and shared indexes, or to take advantage of
|
|
|
445 |
the organization of your data to improve search precision.
|
|
|
446 |
|
489 |
|
447 |
The first time you start recoll, you will be asked whether or not you
|
490 |
The first time you start recoll, you will be asked whether or not you
|
448 |
would like it to build the index. If you want to adjust the configuration
|
491 |
would like it to build the index. If you want to adjust the configuration
|
449 |
before indexing, just click Cancel at this point, which will get you into
|
492 |
before indexing, just click Cancel at this point, which will get you into
|
450 |
the configuration interface. If you exit at this point, recoll will have
|
493 |
the configuration interface. If you exit at this point, recoll will have
|
|
... |
|
... |
457 |
most immediately useful variable you may interested in is probably
|
500 |
most immediately useful variable you may interested in is probably
|
458 |
topdirs, which determines what subtrees get indexed.
|
501 |
topdirs, which determines what subtrees get indexed.
|
459 |
|
502 |
|
460 |
The applications needed to index file types other than text, HTML or email
|
503 |
The applications needed to index file types other than text, HTML or email
|
461 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
504 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
462 |
section
|
505 |
section.
|
463 |
|
506 |
|
464 |
----------------------------------------------------------------------
|
507 |
----------------------------------------------------------------------
|
465 |
|
508 |
|
466 |
2.3.1. The indexing configuration GUI
|
509 |
2.3.1. The indexing configuration GUI
|
467 |
|
510 |
|
|
... |
|
... |
544 |
because some operations which are normally performed at the end of the
|
587 |
because some operations which are normally performed at the end of the
|
545 |
indexing pass will have been skipped (for example, the stemming and
|
588 |
indexing pass will have been skipped (for example, the stemming and
|
546 |
spelling databases will be inexistant or out of date). You just need to
|
589 |
spelling databases will be inexistant or out of date). You just need to
|
547 |
restart indexing at a later time to restore consistency. The indexing will
|
590 |
restart indexing at a later time to restore consistency. The indexing will
|
548 |
restart at the interruption point (the full file tree will be traversed,
|
591 |
restart at the interruption point (the full file tree will be traversed,
|
549 |
but files that were indexed up to the interruption and are still up to
|
592 |
but files that were indexed up to the interruption and for which the index
|
550 |
date will not need to be reindexed).
|
593 |
is still up to date will not need to be reindexed).
|
551 |
|
594 |
|
552 |
recollindex has a number of other options which are described in its man
|
595 |
recollindex has a number of other options which are described in its man
|
553 |
page.
|
596 |
page. Only a few will be described here.
|
554 |
|
597 |
|
|
|
598 |
Option -z will reset the index when starting. This is almost the same as
|
|
|
599 |
destroying the index files (the nuance is that the Xapian format version
|
|
|
600 |
will not be changed).
|
|
|
601 |
|
|
|
602 |
Option -Z will force the update of all documents without resetting the
|
|
|
603 |
index first. This will not have the "clean start" aspect of -z, but the
|
|
|
604 |
advantage is that the index will remain available for querying while it is
|
|
|
605 |
rebuilt, which can be a significant advantage if it is very big (some
|
|
|
606 |
installations need days for a full index rebuild).
|
|
|
607 |
|
555 |
Of special interest maybe are the -i and -f options. -i allows indexing an
|
608 |
Of special interest also, maybe, are the -i and -f options. -i allows
|
556 |
explicit list of files (given as command line parameters or read on
|
609 |
indexing an explicit list of files (given as command line parameters or
|
557 |
stdin). -f tells recollindex to ignore file selection parameters from the
|
610 |
read on stdin). -f tells recollindex to ignore file selection parameters
|
558 |
configuration. Together, these options allow building a custom file
|
611 |
from the configuration. Together, these options allow building a custom
|
559 |
selection process for some area of the file system, by adding the top
|
612 |
file selection process for some area of the file system, by adding the top
|
560 |
directory to the skippedPaths list and using an appropriate file selection
|
613 |
directory to the skippedPaths list and using an appropriate file selection
|
561 |
method to build the file list to be fed to recollindex -if .
|
614 |
method to build the file list to be fed to recollindex -if. Trivial
|
|
|
615 |
example:
|
562 |
|
616 |
|
563 |
recollindex -i will not descend into directory parameters, but just add
|
617 |
find . -name indexable.txt -print | recollindex -if
|
564 |
them as index entries. It is up to the external file selection method to
|
618 |
|
565 |
build the complete file list.
|
619 |
|
|
|
620 |
recollindex -i will not descend into subdirectories specified as
|
|
|
621 |
parameters, but just add them as index entries. It is up to the external
|
|
|
622 |
file selection method to build the complete file list.
|
566 |
|
623 |
|
567 |
----------------------------------------------------------------------
|
624 |
----------------------------------------------------------------------
|
568 |
|
625 |
|
569 |
2.5.2. Using cron to automate indexing
|
626 |
2.5.2. Using cron to automate indexing
|
570 |
|
627 |
|
|
... |
|
... |
640 |
quite big, depending on the log level.
|
697 |
quite big, depending on the log level.
|
641 |
|
698 |
|
642 |
When building Recoll, the real time indexing support can be customised
|
699 |
When building Recoll, the real time indexing support can be customised
|
643 |
during package configuration with the --with[out]-fam or
|
700 |
during package configuration with the --with[out]-fam or
|
644 |
--with[out]-inotify options. The default is currently to include inotify
|
701 |
--with[out]-inotify options. The default is currently to include inotify
|
645 |
monitoring on systems that support it, and, as of recoll 1.17, gamin
|
702 |
monitoring on systems that support it, and, as of Recoll 1.17, gamin
|
646 |
support on FreeBSD.
|
703 |
support on FreeBSD.
|
647 |
|
704 |
|
648 |
While it is convenient that data is indexed in real time, repeated
|
705 |
While it is convenient that data is indexed in real time, repeated
|
649 |
indexing can generate a significant load on the system when files such as
|
706 |
indexing can generate a significant load on the system when files such as
|
650 |
email folders change. Also, monitoring large file trees by itself
|
707 |
email folders change. Also, monitoring large file trees by itself
|
|
... |
|
... |
771 |
punctuation, newlines and all - except for wildcard characters (single ?
|
828 |
punctuation, newlines and all - except for wildcard characters (single ?
|
772 |
characters are ok). Recoll will process it and produce a meaningful
|
829 |
characters are ok). Recoll will process it and produce a meaningful
|
773 |
search. This is what most differentiates this mode from the Query Language
|
830 |
search. This is what most differentiates this mode from the Query Language
|
774 |
mode, where you have to care about the syntax.
|
831 |
mode, where you have to care about the syntax.
|
775 |
|
832 |
|
776 |
You can use the Tools / Advanced search dialog for more complex searches.
|
833 |
You can use the Tools->Advanced search dialog for more complex searches.
|
777 |
|
834 |
|
778 |
----------------------------------------------------------------------
|
835 |
----------------------------------------------------------------------
|
779 |
|
836 |
|
780 |
3.1.2. The default result list
|
837 |
3.1.2. The default result list
|
781 |
|
838 |
|
|
... |
|
... |
922 |
|
979 |
|
923 |
You can display successive or previous documents from the result list
|
980 |
You can display successive or previous documents from the result list
|
924 |
inside a preview tab by typing Shift+Down or Shift+Up (Down and Up are the
|
981 |
inside a preview tab by typing Shift+Down or Shift+Up (Down and Up are the
|
925 |
arrow keys).
|
982 |
arrow keys).
|
926 |
|
983 |
|
927 |
The preview tabs have an internal incremental search function. You
|
|
|
928 |
initiate the search either by typing a / (slash) or CTL-F inside the text
|
|
|
929 |
area or by clicking into the Search for: text field and entering the
|
|
|
930 |
search string. You can then use the Next and Previous buttons to find the
|
|
|
931 |
next/previous occurrence. You can also type F3 inside the text area to get
|
|
|
932 |
to the next occurrence.
|
|
|
933 |
|
|
|
934 |
If you have a search string entered and you use Ctrl-Up/Ctrl-Down to
|
|
|
935 |
browse the results, the search is initiated for each successive document.
|
|
|
936 |
If the string is found, the cursor will be positioned at the first
|
|
|
937 |
occurrence of the search string.
|
|
|
938 |
|
|
|
939 |
A right-click menu in the text area allows switching between displaying
|
984 |
A right-click menu in the text area allows switching between displaying
|
940 |
the main text or the contents of fields associated to the document (ie:
|
985 |
the main text or the contents of fields associated to the document (ie:
|
941 |
author, abtract, etc.). This is especially useful in cases where the term
|
986 |
author, abtract, etc.). This is especially useful in cases where the term
|
942 |
match did not occur in the main text but in one of the fields.
|
987 |
match did not occur in the main text but in one of the fields. In the case
|
|
|
988 |
of images, you can switch between three displays: the image itself, the
|
|
|
989 |
image metadata as extracted by exiftool and the fields, which is the
|
|
|
990 |
metadata stored in the index.
|
943 |
|
991 |
|
944 |
You can print the current preview window contents by typing Ctrl-P (Ctrl +
|
992 |
You can print the current preview window contents by typing Ctrl-P (Ctrl +
|
945 |
P) in the window text.
|
993 |
P) in the window text.
|
|
|
994 |
|
|
|
995 |
----------------------------------------------------------------------
|
|
|
996 |
|
|
|
997 |
3.1.4.1. Searching inside the preview
|
|
|
998 |
|
|
|
999 |
The preview window has an internal search capability, mostly controlled by
|
|
|
1000 |
the panel at the bottom of the window, which works in two modes: as a
|
|
|
1001 |
classical editor incremental search, where we look for the text entered in
|
|
|
1002 |
the entry zone, or as a way to walk the matches between the document and
|
|
|
1003 |
the Recoll query that found it.
|
|
|
1004 |
|
|
|
1005 |
Incremental text search
|
|
|
1006 |
|
|
|
1007 |
The preview tabs have an internal incremental search function. You
|
|
|
1008 |
initiate the search either by typing a / (slash) or CTL-F inside
|
|
|
1009 |
the text area or by clicking into the Search for: text field and
|
|
|
1010 |
entering the search string. You can then use the Next and Previous
|
|
|
1011 |
buttons to find the next/previous occurrence. You can also type F3
|
|
|
1012 |
inside the text area to get to the next occurrence.
|
|
|
1013 |
|
|
|
1014 |
If you have a search string entered and you use Ctrl-Up/Ctrl-Down
|
|
|
1015 |
to browse the results, the search is initiated for each successive
|
|
|
1016 |
document. If the string is found, the cursor will be positioned at
|
|
|
1017 |
the first occurrence of the search string.
|
|
|
1018 |
|
|
|
1019 |
Walking the match lists
|
|
|
1020 |
|
|
|
1021 |
If the entry area is empty when you click the Next or Previous
|
|
|
1022 |
buttons, the editor will be scrolled to show the next match to any
|
|
|
1023 |
search term (the next highlighted zone). If you select a search
|
|
|
1024 |
group from the dropdown list and click Next or Previous, the match
|
|
|
1025 |
list for this group will be walked. This is not the same as a text
|
|
|
1026 |
search, because the occurences will include non-exact matches (as
|
|
|
1027 |
caused by stemming or wildcards). The search will revert to the
|
|
|
1028 |
text mode as soon as you edit the entry area.
|
946 |
|
1029 |
|
947 |
----------------------------------------------------------------------
|
1030 |
----------------------------------------------------------------------
|
948 |
|
1031 |
|
949 |
3.1.5. Complex/advanced search
|
1032 |
3.1.5. Complex/advanced search
|
950 |
|
1033 |
|
|
... |
|
... |
1102 |
|
1185 |
|
1103 |
----------------------------------------------------------------------
|
1186 |
----------------------------------------------------------------------
|
1104 |
|
1187 |
|
1105 |
3.1.7. Multiple databases
|
1188 |
3.1.7. Multiple databases
|
1106 |
|
1189 |
|
1107 |
Multiple Recoll databases or indexes can be created by using several
|
1190 |
See the section describing the use of multiple indexes for generalities.
|
1108 |
configuration directories which are usually set to index different areas
|
1191 |
Only the aspects concerning the recoll GUI are described here.
|
1109 |
of the file system. A specific index can be selected for updating or
|
|
|
1110 |
searching, using the RECOLL_CONFDIR environment variable or the -c option
|
|
|
1111 |
to recoll and recollindex.
|
|
|
1112 |
|
1192 |
|
1113 |
A recollindex program instance can only update one specific index.
|
|
|
1114 |
|
|
|
1115 |
A recoll program instance is also associated with a specific index, which
|
1193 |
A recoll program instance is always associated with a specific index,
|
1116 |
is the one to be updated by its indexing thread, but it can use any number
|
1194 |
which is the one to be updated when requested from the File menu, but it
|
1117 |
of Recoll indexes for searching. The external indexes can be selected
|
1195 |
can use any number of Recoll indexes for searching. The external indexes
|
1118 |
through the external indexes tab in the preferences dialog.
|
1196 |
can be selected through the external indexes tab in the preferences
|
|
|
1197 |
dialog.
|
1119 |
|
1198 |
|
1120 |
Index selection is performed in two phases. A set of all usable indexes
|
1199 |
Index selection is performed in two phases. A set of all usable indexes
|
1121 |
must first be defined, and then the subset of indexes to be used for
|
1200 |
must first be defined, and then the subset of indexes to be used for
|
1122 |
searching. Of course, these parameters are retained across program
|
1201 |
searching. Of course, these parameters are retained across program
|
1123 |
executions (there are kept separately for each Recoll configuration). The
|
1202 |
executions (there are kept separately for each Recoll configuration). The
|
|
... |
|
... |
1134 |
system administrator so that every user does not have to do it. The
|
1213 |
system administrator so that every user does not have to do it. The
|
1135 |
variable should define a colon-separated list of index directories, ie:
|
1214 |
variable should define a colon-separated list of index directories, ie:
|
1136 |
|
1215 |
|
1137 |
export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
|
1216 |
export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
|
1138 |
|
1217 |
|
1139 |
A typical usage scenario for the multiple index feature would be for a
|
1218 |
Another environment variable, RECOLL_ACTIVE_EXTRA_DBS allows adding to the
|
1140 |
system administrator to set up a central index for shared data, that you
|
1219 |
active list of indexes. This variable was suggested and implemented by a
|
1141 |
choose to search or not in addition to your personal data. Of course,
|
1220 |
Recoll user. It is mostly useful if you use scripts to mount external
|
1142 |
there are other possibilities. There are many cases where you know the
|
1221 |
volumes with Recoll indexes. By using RECOLL_EXTRA_DBS and
|
1143 |
subset of files that should be searched, and where narrowing the search
|
1222 |
RECOLL_ACTIVE_EXTRA_DBS, you can add and activate the index for the
|
1144 |
can improve the results. You can achieve approximately the same effect
|
1223 |
mounted volume when starting recoll.
|
1145 |
with the directory filter in advanced search, but multiple indexes will
|
1224 |
|
1146 |
have much better performance and may be worth the trouble.
|
1225 |
RECOLL_ACTIVE_EXTRA_DBS is available for Recoll versions 1.17.2 and later.
|
|
|
1226 |
A change was made in the same update so that recoll will automatically
|
|
|
1227 |
deactivate unreachable indexes when starting up.
|
1147 |
|
1228 |
|
1148 |
----------------------------------------------------------------------
|
1229 |
----------------------------------------------------------------------
|
1149 |
|
1230 |
|
1150 |
3.1.8. Document history
|
1231 |
3.1.8. Document history
|
1151 |
|
1232 |
|
|
... |
|
... |
1530 |
The default value for the paragraph format string is:
|
1611 |
The default value for the paragraph format string is:
|
1531 |
|
1612 |
|
1532 |
<img src="%I" align="left">%R %S %L <b>%T</b><br>
|
1613 |
<img src="%I" align="left">%R %S %L <b>%T</b><br>
|
1533 |
%M %D <i>%U</i> %i<br>
|
1614 |
%M %D <i>%U</i> %i<br>
|
1534 |
%A %K
|
1615 |
%A %K
|
1535 |
|
|
|
1536 |
|
1616 |
|
1537 |
You may, for example, try the following for a more web-like experience:
|
1617 |
You may, for example, try the following for a more web-like experience:
|
1538 |
|
1618 |
|
1539 |
<u><b><a href="P%N">%T</a></b></u><br>
|
1619 |
<u><b><a href="P%N">%T</a></b></u><br>
|
1540 |
%A<font color=#008000>%U - %S</font> - %L
|
1620 |
%A<font color=#008000>%U - %S</font> - %L
|
1541 |
|
|
|
1542 |
|
1621 |
|
|
|
1622 |
Note that the P%N link in the above paragraph makes the title a preview
|
1543 |
Or the clean looking:
|
1623 |
link. Or the clean looking:
|
1544 |
|
1624 |
|
1545 |
<img src="%I" align="left">%L <font color="#900000">%R</font>
|
1625 |
<img src="%I" align="left">%L <font color="#900000">%R</font>
|
1546 |
<b>%T</b><br>%S
|
1626 |
<b>%T&</b><br>%S
|
1547 |
<font color="#808080"><i>%U</i></font>
|
1627 |
<font color="#808080"><i>%U</i></font>
|
1548 |
<table bgcolor="#e0e0e0">
|
1628 |
<table bgcolor="#e0e0e0">
|
1549 |
<tr><td><div>%A</div></td></tr>
|
1629 |
<tr><td><div>%A</div></td></tr>
|
1550 |
</table>%K
|
1630 |
</table>%K
|
1551 |
|
|
|
1552 |
|
|
|
1553 |
Note that the P%N link in the above paragraph makes the title a preview
|
|
|
1554 |
link.
|
|
|
1555 |
|
1631 |
|
1556 |
These samples, and some others are on the web site, with pictures to show
|
1632 |
These samples, and some others are on the web site, with pictures to show
|
1557 |
how they look.
|
1633 |
how they look.
|
1558 |
|
1634 |
|
1559 |
It is also possible to define the value of the snippet separator inside
|
1635 |
It is also possible to define the value of the snippet separator inside
|
|
... |
|
... |
1691 |
|
1767 |
|
1692 |
The language is roughly based on the (seemingly defunct) Xesam user search
|
1768 |
The language is roughly based on the (seemingly defunct) Xesam user search
|
1693 |
language specification.
|
1769 |
language specification.
|
1694 |
|
1770 |
|
1695 |
If the results of a query language search puzzle you and you doubt what
|
1771 |
If the results of a query language search puzzle you and you doubt what
|
1696 |
has been actually searched for, you can use the GUI show query link at the
|
1772 |
has been actually searched for, you can use the GUI Show Query link at the
|
1697 |
top of the result list to check the exact query which was finally executed
|
1773 |
top of the result list to check the exact query which was finally executed
|
1698 |
by Xapian.
|
1774 |
by Xapian.
|
1699 |
|
1775 |
|
1700 |
Here follows a sample request that we are going to explain:
|
1776 |
Here follows a sample request that we are going to explain:
|
1701 |
|
1777 |
|
|
... |
|
... |
1945 |
a new recoll GUI instance every time (even if it is already running). You
|
2021 |
a new recoll GUI instance every time (even if it is already running). You
|
1946 |
may find it useful anyway.
|
2022 |
may find it useful anyway.
|
1947 |
|
2023 |
|
1948 |
----------------------------------------------------------------------
|
2024 |
----------------------------------------------------------------------
|
1949 |
|
2025 |
|
|
|
2026 |
3.7. Multiple databases
|
|
|
2027 |
|
|
|
2028 |
Multiple Recoll databases or indexes can be created by using several
|
|
|
2029 |
configuration directories which are usually set to index different areas
|
|
|
2030 |
of the file system. A specific index can be selected for updating or
|
|
|
2031 |
searching, using the RECOLL_CONFDIR environment variable or the -c option
|
|
|
2032 |
to recoll and recollindex.
|
|
|
2033 |
|
|
|
2034 |
A typical usage scenario for the multiple index feature would be for a
|
|
|
2035 |
system administrator to set up a central index for shared data, that you
|
|
|
2036 |
choose to search or not in addition to your personal data. Of course,
|
|
|
2037 |
there are other possibilities. There are many cases where you know the
|
|
|
2038 |
subset of files that should be searched, and where narrowing the search
|
|
|
2039 |
can improve the results. You can achieve approximately the same effect
|
|
|
2040 |
with the directory filter in advanced search, but multiple indexes will
|
|
|
2041 |
have much better performance and may be worth the trouble.
|
|
|
2042 |
|
|
|
2043 |
A recollindex program instance can only update one specific index.
|
|
|
2044 |
|
|
|
2045 |
The main index (defined by RECOLL_CONFDIR or -c) is always active. If this
|
|
|
2046 |
is undesirable, you can set up your base configuration to index an empty
|
|
|
2047 |
directory.
|
|
|
2048 |
|
|
|
2049 |
The different search interfaces (GUI, command line, ...) have different
|
|
|
2050 |
methods to define the set of indexes to be used, see the appropriate
|
|
|
2051 |
section.
|
|
|
2052 |
|
|
|
2053 |
If a set of multiple indexes are to be used together for searches, some
|
|
|
2054 |
configuration parameters must be consistent among the set. These are
|
|
|
2055 |
parameters which need to be the same when indexing and searching. As the
|
|
|
2056 |
parameters come from the main configuration when searching, they need to
|
|
|
2057 |
be compatible with what was set when creating the other indexes (which
|
|
|
2058 |
came from their respective configuration directories. Most of the relevant
|
|
|
2059 |
parameters are described in the following linked section.
|
|
|
2060 |
|
|
|
2061 |
----------------------------------------------------------------------
|
|
|
2062 |
|
1950 |
Chapter 4. Programming interface
|
2063 |
Chapter 4. Programming interface
|
1951 |
|
2064 |
|
1952 |
Recoll has an Application programming Interface, usable both for indexing
|
2065 |
Recoll has an Application programming Interface, usable both for indexing
|
1953 |
and searching, currently accessible from the Python language.
|
2066 |
and searching, currently accessible from the Python language.
|
1954 |
|
2067 |
|
|
... |
|
... |
2014 |
the filter if the operation is for indexing or previewing. Some filters
|
2127 |
the filter if the operation is for indexing or previewing. Some filters
|
2015 |
use this to output a slightly different format, for example stripping
|
2128 |
use this to output a slightly different format, for example stripping
|
2016 |
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
2129 |
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
2017 |
This is not essential.
|
2130 |
This is not essential.
|
2018 |
|
2131 |
|
2019 |
You should look to one of the simple filters, for example rclps for a
|
2132 |
You should look at one of the simple filters, for example rclps for a
|
2020 |
starting point.
|
2133 |
starting point.
|
2021 |
|
2134 |
|
2022 |
Don't forget to make your filter executable before testing !
|
2135 |
Don't forget to make your filter executable before testing !
|
2023 |
|
2136 |
|
2024 |
----------------------------------------------------------------------
|
2137 |
----------------------------------------------------------------------
|
|
... |
|
... |
2617 |
|
2730 |
|
2618 |
* QTDIR should point to the directory above the one that holds the qt
|
2731 |
* QTDIR should point to the directory above the one that holds the qt
|
2619 |
include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
|
2732 |
include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
|
2620 |
be /usr/local/qt).
|
2733 |
be /usr/local/qt).
|
2621 |
|
2734 |
|
2622 |
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
2735 |
* QMAKESPECS should be set to the name of one of the Qt mkspecs
|
2623 |
sub-directories (ie: linux-g++).
|
2736 |
sub-directories (ie: linux-g++).
|
2624 |
|
2737 |
|
2625 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
2738 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
2626 |
is not needed because there is a default link in mkspecs/.
|
2739 |
is not needed because there is a default link in mkspecs/.
|
2627 |
|
2740 |
|
|
... |
|
... |
2983 |
defaultcharset
|
3096 |
defaultcharset
|
2984 |
|
3097 |
|
2985 |
The name of the character set used for files that do not contain a
|
3098 |
The name of the character set used for files that do not contain a
|
2986 |
character set definition (ie: plain text files). This can be
|
3099 |
character set definition (ie: plain text files). This can be
|
2987 |
redefined for any sub-directory. If it is not set at all, the
|
3100 |
redefined for any sub-directory. If it is not set at all, the
|
2988 |
character set used is the one defined by the nls environment
|
3101 |
character set used is the one defined by the nls environment (
|
2989 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
3102 |
LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
2990 |
|
3103 |
|
2991 |
unac_except_trans
|
3104 |
unac_except_trans
|
2992 |
|
3105 |
|
2993 |
This is a list of characters, encoded in UTF-8, which should be
|
3106 |
This is a list of characters, encoded in UTF-8, which should be
|
2994 |
handled specially when converting text to unaccented lowercase.
|
3107 |
handled specially when converting text to unaccented lowercase.
|