|
a/src/README |
|
b/src/README |
|
... |
|
... |
46 |
|
46 |
|
47 |
2.2.2. Security aspects
|
47 |
2.2.2. Security aspects
|
48 |
|
48 |
|
49 |
2.3. Index configuration
|
49 |
2.3. Index configuration
|
50 |
|
50 |
|
|
|
51 |
2.3.1. Multiple indexes
|
|
|
52 |
|
51 |
2.3.1. Index case and diacritics sensitivity
|
53 |
2.3.2. Index case and diacritics sensitivity
|
52 |
|
54 |
|
53 |
2.3.2. The index configuration GUI
|
55 |
2.3.3. The index configuration GUI
|
54 |
|
56 |
|
55 |
2.4. Using Beagle WEB browser plugins
|
57 |
2.4. Using Beagle WEB browser plugins
|
56 |
|
58 |
|
57 |
2.5. Periodic indexing
|
59 |
2.5. Periodic indexing
|
58 |
|
60 |
|
|
... |
|
... |
79 |
|
81 |
|
80 |
3.1.5. Complex/advanced search
|
82 |
3.1.5. Complex/advanced search
|
81 |
|
83 |
|
82 |
3.1.6. The term explorer tool
|
84 |
3.1.6. The term explorer tool
|
83 |
|
85 |
|
84 |
3.1.7. Multiple databases
|
86 |
3.1.7. Multiple indexes
|
85 |
|
87 |
|
86 |
3.1.8. Document history
|
88 |
3.1.8. Document history
|
87 |
|
89 |
|
88 |
3.1.9. Sorting search results and collapsing
|
90 |
3.1.9. Sorting search results and collapsing
|
89 |
duplicates
|
91 |
duplicates
|
|
... |
|
... |
115 |
3.7. Desktop integration
|
117 |
3.7. Desktop integration
|
116 |
|
118 |
|
117 |
3.7.1. Hotkeying recoll
|
119 |
3.7.1. Hotkeying recoll
|
118 |
|
120 |
|
119 |
3.7.2. The KDE Kicker Recoll applet
|
121 |
3.7.2. The KDE Kicker Recoll applet
|
120 |
|
|
|
121 |
3.8. Multiple databases
|
|
|
122 |
|
122 |
|
123 |
4. Programming interface
|
123 |
4. Programming interface
|
124 |
|
124 |
|
125 |
4.1. Writing a document filter
|
125 |
4.1. Writing a document filter
|
126 |
|
126 |
|
|
... |
|
... |
188 |
you may first want to customize the configuration to restrict the indexed
|
188 |
you may first want to customize the configuration to restrict the indexed
|
189 |
area.
|
189 |
area.
|
190 |
|
190 |
|
191 |
Also be aware that you may need to install the appropriate supporting
|
191 |
Also be aware that you may need to install the appropriate supporting
|
192 |
applications for document types that need them (for example antiword for
|
192 |
applications for document types that need them (for example antiword for
|
193 |
ms-word files).
|
193 |
Microsoft Word files).
|
194 |
|
194 |
|
195 |
----------------------------------------------------------------------
|
195 |
----------------------------------------------------------------------
|
196 |
|
196 |
|
197 |
1.2. Full text search
|
197 |
1.2. Full text search
|
198 |
|
198 |
|
|
... |
|
... |
203 |
return a list of matching documents, ordered so that the most relevant
|
203 |
return a list of matching documents, ordered so that the most relevant
|
204 |
documents will appear first.
|
204 |
documents will appear first.
|
205 |
|
205 |
|
206 |
You do not need to remember in what file or email message you stored a
|
206 |
You do not need to remember in what file or email message you stored a
|
207 |
given piece of information. You just ask for related terms, and the tool
|
207 |
given piece of information. You just ask for related terms, and the tool
|
208 |
will return a list of documents where those terms are prominent, in a
|
208 |
will return a list of documents where these terms are prominent, in a
|
209 |
similar way to Internet search engines.
|
209 |
similar way to Internet search engines.
|
210 |
|
210 |
|
211 |
A search application tries to determine which documents are most relevant
|
211 |
A search application tries to determine which documents are most relevant
|
212 |
to the search terms you provide. Computer algorithms for determining
|
212 |
to the search terms you provide. Computer algorithms for determining
|
213 |
relevance can be very complex, and in general are inferior to the power of
|
213 |
relevance can be very complex, and in general are inferior to the power of
|
|
... |
|
... |
253 |
|
253 |
|
254 |
Stemming is the process by which Recoll reduces words to their radicals so
|
254 |
Stemming is the process by which Recoll reduces words to their radicals so
|
255 |
that searching does not depend, for example, on a word being singular or
|
255 |
that searching does not depend, for example, on a word being singular or
|
256 |
plural (floor, floors), or on a verb tense (flooring, floored). Because
|
256 |
plural (floor, floors), or on a verb tense (flooring, floored). Because
|
257 |
the mechanisms used for stemming depend on the specific grammatical rules
|
257 |
the mechanisms used for stemming depend on the specific grammatical rules
|
258 |
for each language, there is a separate stemmer module for most common
|
258 |
for each language, there is a separate Xapian stemmer module for most
|
259 |
languages where stemming makes sense.
|
259 |
common languages where stemming makes sense.
|
260 |
|
260 |
|
261 |
Recoll stores the unstemmed versions of terms in the main index and uses
|
261 |
Recoll stores the unstemmed versions of terms in the main index and uses
|
262 |
auxiliary databases for term expansion (one for each stemming language),
|
262 |
auxiliary databases for term expansion (one for each stemming language),
|
263 |
which means that you can switch stemming languages between searches, or
|
263 |
which means that you can switch stemming languages between searches, or
|
264 |
add a language without needing a full reindex.
|
264 |
add a language without needing a full reindex.
|
|
... |
|
... |
269 |
|
269 |
|
270 |
Recoll currently makes no attempt at automatic language recognition, which
|
270 |
Recoll currently makes no attempt at automatic language recognition, which
|
271 |
means that the stemmer will sometimes be applied to terms from other
|
271 |
means that the stemmer will sometimes be applied to terms from other
|
272 |
languages with potentially strange results. In practise, even if this
|
272 |
languages with potentially strange results. In practise, even if this
|
273 |
introduces possibilities of confusion, this approach has been proven quite
|
273 |
introduces possibilities of confusion, this approach has been proven quite
|
274 |
useful, and, awaiting the addition of an automatic language recognition
|
|
|
275 |
module to Recoll, it is much less cumbersome than separating your
|
274 |
useful, and it is much less cumbersome than separating your documents
|
276 |
documents according to what language they are written in.
|
275 |
according to what language they are written in.
|
277 |
|
276 |
|
278 |
Before version 1.18, Recoll always stripped most accents and diacritics
|
277 |
Before version 1.18, Recoll stripped most accents and diacritics from
|
279 |
from terms, and converted them to lower case before storing them in the
|
278 |
terms, and converted them to lower case before either storing them in the
|
280 |
index. As a consequence, it was impossible to search for a particular
|
279 |
index or searching for them. As a consequence, it was impossible to search
|
281 |
capitalization of a term (US / us), or to discriminate two terms based on
|
280 |
for a particular capitalization of a term (US / us), or to discriminate
|
282 |
diacritics (sake / sake, mate / mate).
|
281 |
two terms based on diacritics (sake / sake, mate / mate).
|
283 |
|
282 |
|
284 |
As of version 1.18, Recoll can optionally store the raw terms, without
|
283 |
As of version 1.18, Recoll can optionally store the raw terms, without
|
285 |
accent stripping or case conversion. Expansions necessary for searches
|
284 |
accent stripping or case conversion. In this configuration, it is still
|
286 |
insensitive to case and/or diacritics are then performed when searching.
|
285 |
possible (and most common) for a query to be insensitive to case and/or
|
287 |
This is described in more detail in the section about index case and
|
286 |
diacritics. Appropriate term expansions are performed before actually
|
288 |
diacritics sensitivity.
|
287 |
accessing the main index. This is described in more detail in the section
|
|
|
288 |
about index case and diacritics sensitivity.
|
289 |
|
289 |
|
290 |
Recoll has many parameters which define exactly what to index, and how to
|
290 |
Recoll has many parameters which define exactly what to index, and how to
|
291 |
classify and decode the source documents. These are kept in configuration
|
291 |
classify and decode the source documents. These are kept in configuration
|
292 |
files. A default configuration is copied into a standard location (usually
|
292 |
files. A default configuration is copied into a standard location (usually
|
293 |
something like /usr/[local/]share/recoll/examples) during installation.
|
293 |
something like /usr/[local/]share/recoll/examples) during installation.
|
|
... |
|
... |
295 |
overridden by values that you set inside your personal configuration,
|
295 |
overridden by values that you set inside your personal configuration,
|
296 |
found by default in the .recoll sub-directory of your home directory. The
|
296 |
found by default in the .recoll sub-directory of your home directory. The
|
297 |
default configuration will index your home directory with default
|
297 |
default configuration will index your home directory with default
|
298 |
parameters and should be sufficient for giving Recoll a try, but you may
|
298 |
parameters and should be sufficient for giving Recoll a try, but you may
|
299 |
want to adjust it later, which can be done either by editing the text
|
299 |
want to adjust it later, which can be done either by editing the text
|
300 |
files or by using configuration menus in the recoll GUI
|
300 |
files or by using configuration menus in the recoll GUI. Some other
|
|
|
301 |
parameters affecting only the recoll GUI are stored in the standard
|
|
|
302 |
location defined by Qt.
|
301 |
|
303 |
|
302 |
The indexing process is started automatically the first time you execute
|
304 |
The indexing process is started automatically the first time you execute
|
303 |
the recoll GUI. Indexing can also be performed by executing the
|
305 |
the recoll GUI. Indexing can also be performed by executing the
|
304 |
recollindex command.
|
306 |
recollindex command.
|
305 |
|
307 |
|
|
... |
|
... |
344 |
they can be combined by setting up multiple indexes (ie: use periodic
|
346 |
they can be combined by setting up multiple indexes (ie: use periodic
|
345 |
indexing on a big documentation directory, and real time indexing on a
|
347 |
indexing on a big documentation directory, and real time indexing on a
|
346 |
small home directory). Monitoring a big file system tree can consume
|
348 |
small home directory). Monitoring a big file system tree can consume
|
347 |
significant system resources.
|
349 |
significant system resources.
|
348 |
|
350 |
|
|
|
351 |
The choice of method and the parameters used can be configured from the
|
|
|
352 |
recoll GUI: Preferences->Indexing schedule
|
|
|
353 |
|
349 |
----------------------------------------------------------------------
|
354 |
----------------------------------------------------------------------
|
350 |
|
355 |
|
351 |
2.1.2. Configurations, multiple indexes
|
356 |
2.1.2. Configurations, multiple indexes
|
352 |
|
357 |
|
353 |
The parameters describing what is to be indexed and local preferences are
|
358 |
The parameters describing what is to be indexed and local preferences are
|
|
... |
|
... |
387 |
|
392 |
|
388 |
Most file types, like HTML or word processing files, only hold one
|
393 |
Most file types, like HTML or word processing files, only hold one
|
389 |
document. Some file types, like email folders or zip archives, can hold
|
394 |
document. Some file types, like email folders or zip archives, can hold
|
390 |
many individually indexed documents, which may themselves be compound
|
395 |
many individually indexed documents, which may themselves be compound
|
391 |
ones. Such hierarchies can go quite deep, and Recoll can process, for
|
396 |
ones. Such hierarchies can go quite deep, and Recoll can process, for
|
392 |
example, an ms-word document stored as an attachment to an email message
|
397 |
example, a LibreOffice document stored as an attachment to an email
|
393 |
inside an email folder archived in a zip file...
|
398 |
message inside an email folder archived in a zip file...
|
394 |
|
399 |
|
395 |
Recoll indexing processes plain text, HTML, OpenDocument
|
400 |
Recoll indexing processes plain text, HTML, OpenDocument
|
396 |
(Open/LibreOffice), email formats, and a few others internally.
|
401 |
(Open/LibreOffice), email formats, and a few others internally.
|
397 |
|
402 |
|
398 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
403 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
|
... |
|
... |
436 |
and, (unless specified otherwise in recoll.conf) would look for the
|
441 |
and, (unless specified otherwise in recoll.conf) would look for the
|
437 |
index in ~/.indexes-email/xapiandb/.
|
442 |
index in ~/.indexes-email/xapiandb/.
|
438 |
|
443 |
|
439 |
Using multiple configuration directories and configuration options
|
444 |
Using multiple configuration directories and configuration options
|
440 |
allows you to tailor multiple configurations and indexes to handle
|
445 |
allows you to tailor multiple configurations and indexes to handle
|
441 |
whatever subset of the available data that you wish to make
|
446 |
whatever subset of the available data you wish to make searchable.
|
442 |
searchable.
|
|
|
443 |
|
447 |
|
444 |
* You can also specify a different storage location for the index by
|
448 |
* For a given configuration directory, you can specify a non-default
|
445 |
setting the dbdir parameter in the configuration file (see the
|
449 |
storage location for the index by setting the dbdir parameter in the
|
446 |
configuration section). This method would mainly be of use if you
|
450 |
configuration file (see the configuration section). This method would
|
447 |
wanted to keep the configuration directory in its default location,
|
451 |
mainly be of use if you wanted to keep the configuration directory in
|
448 |
but desired another location for the index, typically out of disk
|
452 |
its default location, but desired another location for the index,
|
449 |
occupation concerns.
|
453 |
typically out of disk occupation concerns.
|
450 |
|
454 |
|
451 |
The size of the index is determined by the size of the set of documents,
|
455 |
The size of the index is determined by the size of the set of documents,
|
452 |
but the ratio can vary a lot. For a typical mixed set of documents, the
|
456 |
but the ratio can vary a lot. For a typical mixed set of documents, the
|
453 |
index size will often be close to the data set size. In specific cases (a
|
457 |
index size will often be close to the data set size. In specific cases (a
|
454 |
set of compressed mbox files for example), the index can become much
|
458 |
set of compressed mbox files for example), the index can become much
|
|
... |
|
... |
504 |
|
508 |
|
505 |
2.3. Index configuration
|
509 |
2.3. Index configuration
|
506 |
|
510 |
|
507 |
Variables set inside the Recoll configuration files control which areas of
|
511 |
Variables set inside the Recoll configuration files control which areas of
|
508 |
the file system are indexed, and how files are processed. These variables
|
512 |
the file system are indexed, and how files are processed. These variables
|
509 |
can be set either by editing the text files or using the dialogs in the
|
513 |
can be set either by editing the text files or by using the dialogs in the
|
510 |
recoll GUI.
|
514 |
recoll GUI.
|
511 |
|
515 |
|
512 |
The first time you start recoll, you will be asked whether or not you
|
516 |
The first time you start recoll, you will be asked whether or not you
|
513 |
would like it to build the index. If you want to adjust the configuration
|
517 |
would like it to build the index. If you want to adjust the configuration
|
514 |
before indexing, just click Cancel at this point, which will get you into
|
518 |
before indexing, just click Cancel at this point, which will get you into
|
|
... |
|
... |
524 |
|
528 |
|
525 |
The applications needed to index file types other than text, HTML or email
|
529 |
The applications needed to index file types other than text, HTML or email
|
526 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
530 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
527 |
section.
|
531 |
section.
|
528 |
|
532 |
|
529 |
----------------------------------------------------------------------
|
533 |
As of Recoll 1.18 there are two incompatible types of Recoll indexes,
|
|
|
534 |
depending on the treatment of character case and diacritics. The next
|
|
|
535 |
section describes the two types in more detail.
|
530 |
|
536 |
|
|
|
537 |
----------------------------------------------------------------------
|
|
|
538 |
|
|
|
539 |
2.3.1. Multiple indexes
|
|
|
540 |
|
|
|
541 |
Multiple Recoll indexes can be created by using several configuration
|
|
|
542 |
directories which are usually set to index different areas of the file
|
|
|
543 |
system. A specific index can be selected for updating or searching, using
|
|
|
544 |
the RECOLL_CONFDIR environment variable or the -c option to recoll and
|
|
|
545 |
recollindex.
|
|
|
546 |
|
|
|
547 |
A typical usage scenario for the multiple index feature would be for a
|
|
|
548 |
system administrator to set up a central index for shared data, that you
|
|
|
549 |
choose to search or not in addition to your personal data. Of course,
|
|
|
550 |
there are other possibilities. There are many cases where you know the
|
|
|
551 |
subset of files that should be searched, and where narrowing the search
|
|
|
552 |
can improve the results. You can achieve approximately the same effect
|
|
|
553 |
with the directory filter in advanced search, but multiple indexes will
|
|
|
554 |
have much better performance and may be worth the trouble.
|
|
|
555 |
|
|
|
556 |
A recollindex program instance can only update one specific index.
|
|
|
557 |
|
|
|
558 |
The main index (defined by RECOLL_CONFDIR or -c) is always active. If this
|
|
|
559 |
is undesirable, you can set up your base configuration to index an empty
|
|
|
560 |
directory.
|
|
|
561 |
|
|
|
562 |
The different search interfaces (GUI, command line, ...) have different
|
|
|
563 |
methods to define the set of indexes to be used, see the appropriate
|
|
|
564 |
section.
|
|
|
565 |
|
|
|
566 |
If a set of multiple indexes are to be used together for searches, some
|
|
|
567 |
configuration parameters must be consistent among the set. These are
|
|
|
568 |
parameters which need to be the same when indexing and searching. As the
|
|
|
569 |
parameters come from the main configuration when searching, they need to
|
|
|
570 |
be compatible with what was set when creating the other indexes (which
|
|
|
571 |
came from their respective configuration directories).
|
|
|
572 |
|
|
|
573 |
Most importantly, all indexes to be queried concurrently must have the
|
|
|
574 |
same option concerning character case and diacritics stripping, but there
|
|
|
575 |
are other constraints. Most of the relevant parameters are described in
|
|
|
576 |
the linked section.
|
|
|
577 |
|
|
|
578 |
----------------------------------------------------------------------
|
|
|
579 |
|
531 |
2.3.1. Index case and diacritics sensitivity
|
580 |
2.3.2. Index case and diacritics sensitivity
|
532 |
|
581 |
|
533 |
As of Recoll version 1.18 you have a choice of building an index with
|
582 |
As of Recoll version 1.18 you have a choice of building an index with
|
534 |
terms stripped of character case and diacritics, or one with raw terms.
|
583 |
terms stripped of character case and diacritics, or one with raw terms.
|
535 |
For a source term of Resume, the former will store resume, the latter
|
584 |
For a source term of Resume, the former will store resume, the latter
|
536 |
Resume.
|
585 |
Resume.
|
|
... |
|
... |
554 |
If the indexStripChars is not set, Recoll 1.18 creates a stripped index by
|
603 |
If the indexStripChars is not set, Recoll 1.18 creates a stripped index by
|
555 |
default, for compatibility with previous versions.
|
604 |
default, for compatibility with previous versions.
|
556 |
|
605 |
|
557 |
As a cost for added capability, a raw index will be slightly bigger than a
|
606 |
As a cost for added capability, a raw index will be slightly bigger than a
|
558 |
stripped one (around 10%). Also, searches will be more complex, so
|
607 |
stripped one (around 10%). Also, searches will be more complex, so
|
559 |
probably slightly slower, and the feature is still young, and a certain
|
608 |
probably slightly slower, and the feature is still young, so that a
|
560 |
amount of weirdness cannot be excluded.
|
609 |
certain amount of weirdness cannot be excluded.
|
561 |
|
610 |
|
562 |
----------------------------------------------------------------------
|
611 |
----------------------------------------------------------------------
|
563 |
|
612 |
|
564 |
2.3.2. The index configuration GUI
|
613 |
2.3.3. The index configuration GUI
|
565 |
|
614 |
|
566 |
Most parameters for a given index configuration can be set from a recoll
|
615 |
Most parameters for a given index configuration can be set from a recoll
|
567 |
GUI running on this configuration (either as default, or by setting
|
616 |
GUI running on this configuration (either as default, or by setting
|
568 |
RECOLL_CONFDIR or the -c option.)
|
617 |
RECOLL_CONFDIR or the -c option.)
|
569 |
|
618 |
|
|
... |
|
... |
795 |
* Simple search (the default, on the main screen) has a single entry
|
844 |
* Simple search (the default, on the main screen) has a single entry
|
796 |
field where you can enter multiple words.
|
845 |
field where you can enter multiple words.
|
797 |
|
846 |
|
798 |
* Advanced search (a panel accessed through the Tools menu or the
|
847 |
* Advanced search (a panel accessed through the Tools menu or the
|
799 |
toolbox bar icon) has multiple entry fields, which you may use to
|
848 |
toolbox bar icon) has multiple entry fields, which you may use to
|
800 |
build a logical condition, with additional filtering on file type and
|
849 |
build a logical condition, with additional filtering on file type,
|
801 |
location in the file system.
|
850 |
location in the file system, modification date, and size.
|
802 |
|
851 |
|
803 |
In most cases, you can enter the terms as you think them, even if they
|
852 |
In most cases, you can enter the terms as you think them, even if they
|
804 |
contain embedded punctuation or other non-textual characters. For example,
|
853 |
contain embedded punctuation or other non-textual characters. For example,
|
805 |
Recoll can handle things like email addresses, or arbitrary cut and paste
|
854 |
Recoll can handle things like email addresses, or arbitrary cut and paste
|
806 |
from another text window, punctation and all.
|
855 |
from another text window, punctation and all.
|
|
... |
|
... |
830 |
terms mode which will ignore such directives. Any term will search for
|
879 |
terms mode which will ignore such directives. Any term will search for
|
831 |
documents where at least one of the terms appear.
|
880 |
documents where at least one of the terms appear.
|
832 |
|
881 |
|
833 |
The Query Language features are described in a separate section.
|
882 |
The Query Language features are described in a separate section.
|
834 |
|
883 |
|
835 |
File name will specifically look for file names. The entry will be split
|
|
|
836 |
at white space characters, and each fragment will be separately expanded,
|
|
|
837 |
then the search will be for file names matching all fragments (this is new
|
|
|
838 |
in 1.15, older releases did an OR of the whole thing which did not make
|
|
|
839 |
sense). Things to know:
|
|
|
840 |
|
|
|
841 |
* The search is case- and accent-insensitive.
|
|
|
842 |
|
|
|
843 |
* Fragments without any wild card character and not capitalized will be
|
|
|
844 |
prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc). Of
|
|
|
845 |
course it does not make sense to have multiple fragments if one of
|
|
|
846 |
them is capitalized (as this one will require an exact match).
|
|
|
847 |
|
|
|
848 |
* If you want to search for a pattern including white space, use double
|
|
|
849 |
quotes (ie: "admin note*").
|
|
|
850 |
|
|
|
851 |
* If you have a big index (many files), excessively generic fragments
|
|
|
852 |
may result in inefficient searches.
|
|
|
853 |
|
|
|
854 |
* As an example, inst recoll would match recollinstall.in (and quite a
|
|
|
855 |
few others...).
|
|
|
856 |
|
|
|
857 |
The point of having a separate file name search is that wild card
|
|
|
858 |
expansion can be performed more efficiently on a relatively small subset
|
|
|
859 |
of the index (allowing wild cards on the left of terms without excessive
|
|
|
860 |
penality).
|
|
|
861 |
|
|
|
862 |
All search modes allow wildcards inside terms (*, ?, []). You may want to
|
884 |
All search modes allow wildcards inside terms (*, ?, []). You may want to
|
863 |
have a look at the section about wildcards for more information about
|
885 |
have a look at the section about wildcards for more information about
|
864 |
this.
|
886 |
this.
|
865 |
|
887 |
|
|
|
888 |
File name will specifically look for file names. The point of having a
|
|
|
889 |
separate file name search is that wild card expansion can be performed
|
|
|
890 |
more efficiently on a small subset of the index (allowing wild cards on
|
|
|
891 |
the left of terms without excessive penality). Things to know:
|
|
|
892 |
|
|
|
893 |
* White space in the entry should match white space in the file name,
|
|
|
894 |
and is not treated specially.
|
|
|
895 |
|
|
|
896 |
* The search is insensitive to character case and accents, independantly
|
|
|
897 |
of the type of index.
|
|
|
898 |
|
|
|
899 |
* An entry without any wild card character and not capitalized will be
|
|
|
900 |
prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
|
|
|
901 |
|
|
|
902 |
* If you have a big index (many files), excessively generic fragments
|
|
|
903 |
may result in inefficient searches.
|
|
|
904 |
|
866 |
You can search for exact phrases (adjacent words in a given order) by
|
905 |
You can search for exact phrases (adjacent words in a given order) by
|
867 |
enclosing the input inside double quotes. Ex: "virtual reality".
|
906 |
enclosing the input inside double quotes. Ex: "virtual reality".
|
868 |
|
907 |
|
869 |
Character case has no influence on search, except that you can disable
|
908 |
When using a stripped index, character case has no influence on search,
|
870 |
stem expansion for any term by capitalizing it. Ie: a search for floor
|
909 |
except that you can disable stem expansion for any term by capitalizing
|
871 |
will also normally look for flooring, floored, etc., but a search for
|
910 |
it. Ie: a search for floor will also normally look for flooring, floored,
|
872 |
Floor will only look for floor, in any character case. Stemming can also
|
911 |
etc., but a search for Floor will only look for floor, in any character
|
873 |
be disabled globally in the preferences.
|
912 |
case. Stemming can also be disabled globally in the preferences. When
|
|
|
913 |
using a raw index, the rules are a bit more complicated.
|
874 |
|
914 |
|
875 |
Recoll remembers the last few searches that you performed. You can use the
|
915 |
Recoll remembers the last few searches that you performed. You can use the
|
876 |
simple search text entry widget (a combobox) to recall them (click on the
|
916 |
simple search text entry widget (a combobox) to recall them (click on the
|
877 |
thing at the right of the text field). Please note, however, that only the
|
917 |
thing at the right of the text field). Please note, however, that only the
|
878 |
search texts are remembered, not the mode (all/any/file name).
|
918 |
search texts are remembered, not the mode (all/any/file name).
|
|
... |
|
... |
900 |
the main list window.
|
940 |
the main list window.
|
901 |
|
941 |
|
902 |
By default, the document list is presented in order of relevance (how well
|
942 |
By default, the document list is presented in order of relevance (how well
|
903 |
the system estimates that the document matches the query). You can sort
|
943 |
the system estimates that the document matches the query). You can sort
|
904 |
the result by ascending or descending date by using the vertical arrows in
|
944 |
the result by ascending or descending date by using the vertical arrows in
|
905 |
the toolbar (the old sort tool is gone after release 1.15, because the new
|
945 |
the toolbar.
|
906 |
result table has much better capability).
|
|
|
907 |
|
946 |
|
908 |
Clicking on the Preview link for an entry will open an internal preview
|
947 |
Clicking on the Preview link for an entry will open an internal preview
|
909 |
window for the document. Further Preview clicks for the same search will
|
948 |
window for the document. Further Preview clicks for the same search will
|
910 |
open tabs in the existing preview window. You can use Shift+Click to force
|
949 |
open tabs in the existing preview window. You can use Shift+Click to force
|
911 |
the creation of another preview window, which may be useful to view the
|
950 |
the creation of another preview window, which may be useful to view the
|
|
... |
|
... |
1243 |
Weird things will probably happen if languages are mixed up.
|
1282 |
Weird things will probably happen if languages are mixed up.
|
1244 |
|
1283 |
|
1245 |
Note that in cases where Recoll does not know the beginning of the string
|
1284 |
Note that in cases where Recoll does not know the beginning of the string
|
1246 |
to search for (ie a wildcard expression like *coll), the expansion can
|
1285 |
to search for (ie a wildcard expression like *coll), the expansion can
|
1247 |
take quite a long time because the full index term list will have to be
|
1286 |
take quite a long time because the full index term list will have to be
|
1248 |
processed. The expansion is currently limited at 200 results for wildcards
|
1287 |
processed. The expansion is currently limited at 10000 results for
|
1249 |
and regular expressions.
|
1288 |
wildcards and regular expressions.
|
1250 |
|
1289 |
|
1251 |
Double-clicking on a term in the result list will insert it into the
|
1290 |
Double-clicking on a term in the result list will insert it into the
|
1252 |
simple search entry field. You can also cut/paste between the result list
|
1291 |
simple search entry field. You can also cut/paste between the result list
|
1253 |
and any entry field (the end of lines will be taken care of).
|
1292 |
and any entry field (the end of lines will be taken care of).
|
1254 |
|
1293 |
|
1255 |
----------------------------------------------------------------------
|
1294 |
----------------------------------------------------------------------
|
1256 |
|
1295 |
|
1257 |
3.1.7. Multiple databases
|
1296 |
3.1.7. Multiple indexes
|
1258 |
|
1297 |
|
1259 |
See the section describing the use of multiple indexes for generalities.
|
1298 |
See the section describing the use of multiple indexes for generalities.
|
1260 |
Only the aspects concerning the recoll GUI are described here.
|
1299 |
Only the aspects concerning the recoll GUI are described here.
|
1261 |
|
1300 |
|
1262 |
A recoll program instance is always associated with a specific index,
|
1301 |
A recoll program instance is always associated with a specific index,
|
|
... |
|
... |
1328 |
It is also possible to hide duplicate entries inside the result list
|
1367 |
It is also possible to hide duplicate entries inside the result list
|
1329 |
(documents with the exact same contents as the displayed one). The test of
|
1368 |
(documents with the exact same contents as the displayed one). The test of
|
1330 |
identity is based on an MD5 hash of the document container, not only of
|
1369 |
identity is based on an MD5 hash of the document container, not only of
|
1331 |
the text contents (so that ie, a text document with an image added will
|
1370 |
the text contents (so that ie, a text document with an image added will
|
1332 |
not be a duplicate of the text only). Duplicates hiding is controlled by
|
1371 |
not be a duplicate of the text only). Duplicates hiding is controlled by
|
1333 |
an entry in the Query configuration dialog, and is off by default.
|
1372 |
an entry in the GUI configuration dialog, and is off by default.
|
1334 |
|
1373 |
|
1335 |
----------------------------------------------------------------------
|
1374 |
----------------------------------------------------------------------
|
1336 |
|
1375 |
|
1337 |
3.1.10. Search tips, shortcuts
|
1376 |
3.1.10. Search tips, shortcuts
|
1338 |
|
1377 |
|
|
... |
|
... |
1449 |
|
1488 |
|
1450 |
----------------------------------------------------------------------
|
1489 |
----------------------------------------------------------------------
|
1451 |
|
1490 |
|
1452 |
3.1.11. Customizing the search interface
|
1491 |
3.1.11. Customizing the search interface
|
1453 |
|
1492 |
|
1454 |
You can customize some aspects of the search interface by using the Query
|
1493 |
You can customize some aspects of the search interface by using the GUI
|
1455 |
configuration entry in the Preferences menu.
|
1494 |
configuration entry in the Preferences menu.
|
1456 |
|
1495 |
|
1457 |
There are several tabs in the dialog, dealing with the interface itself,
|
1496 |
There are several tabs in the dialog, dealing with the interface itself,
|
1458 |
the parameters used for searching and returning results, and what indexes
|
1497 |
the parameters used for searching and returning results, and what indexes
|
1459 |
are searched.
|
1498 |
are searched.
|
|
... |
|
... |
1480 |
* Prefer HTML to plain text for preview if set, Recoll will display HTML
|
1519 |
* Prefer HTML to plain text for preview if set, Recoll will display HTML
|
1481 |
as such inside the preview window. If this causes problems with the Qt
|
1520 |
as such inside the preview window. If this causes problems with the Qt
|
1482 |
HTML display, you can uncheck it to display the plain text version
|
1521 |
HTML display, you can uncheck it to display the plain text version
|
1483 |
instead.
|
1522 |
instead.
|
1484 |
|
1523 |
|
1485 |
* Use <PRE> tags instead of <BR> to display plain text as HTML in
|
1524 |
* Plain text to HTML line style: when displaying plain text inside the
|
1486 |
preview: when displaying plain text inside the preview window, Recoll
|
1525 |
preview window, Recoll tries to preserve some of the original text
|
1487 |
tries to preserve some of the original text line breaks and
|
|
|
1488 |
indentation. It can either use PRE HTML tags, which will well preserve
|
1526 |
line breaks and indentation. It can either use PRE HTML tags, which
|
1489 |
the indentation but will force horizontal scrolling for long lines, or
|
1527 |
will well preserve the indentation but will force horizontal scrolling
|
1490 |
use BR tags to break at the original line breaks, which will let the
|
1528 |
for long lines, or use BR tags to break at the original line breaks,
|
1491 |
editor introduce other line breaks according to the window width, but
|
1529 |
which will let the editor introduce other line breaks according to the
|
1492 |
will lose some of the original indentation.
|
1530 |
window width, but will lose some of the original indentation. The
|
|
|
1531 |
third option has been available in recent releases and is probably now
|
|
|
1532 |
the best one: use PRE tags with line wrapping.
|
1493 |
|
1533 |
|
1494 |
* Use desktop preferences to choose document editor: if this is checked,
|
1534 |
* Use desktop preferences to choose document editor: if this is checked,
|
1495 |
the xdg-open utility will be used to open files when you click the
|
1535 |
the xdg-open utility will be used to open files when you click the
|
1496 |
Open link in the result list, instead of the application defined in
|
1536 |
Open link in the result list, instead of the application defined in
|
1497 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
1537 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
|
... |
|
... |
1499 |
|
1539 |
|
1500 |
* Exceptions: when using the desktop preferences for opening documents,
|
1540 |
* Exceptions: when using the desktop preferences for opening documents,
|
1501 |
these are mime types that will still be opened according to Recoll
|
1541 |
these are mime types that will still be opened according to Recoll
|
1502 |
preferences. This is useful for passing parameters like page numbers
|
1542 |
preferences. This is useful for passing parameters like page numbers
|
1503 |
or search strings to applications that support them (e.g. evince).
|
1543 |
or search strings to applications that support them (e.g. evince).
|
|
|
1544 |
This cannot be done with xdg-open which only supports passing one
|
|
|
1545 |
parameter.
|
1504 |
|
1546 |
|
1505 |
* Choose editor applications this will let you choose the command
|
1547 |
* Choose editor applications this will let you choose the command
|
1506 |
started by the Open links inside the result list, for specific
|
1548 |
started by the Open links inside the result list, for specific
|
1507 |
document types.
|
1549 |
document types.
|
1508 |
|
1550 |
|
|
... |
|
... |
1512 |
* Auto-start simple search on white space entry: if this is checked, a
|
1554 |
* Auto-start simple search on white space entry: if this is checked, a
|
1513 |
search will be executed each time you enter a space in the simple
|
1555 |
search will be executed each time you enter a space in the simple
|
1514 |
search input field. This lets you look at the result list as you enter
|
1556 |
search input field. This lets you look at the result list as you enter
|
1515 |
new terms. This is off by default, you may like it or not...
|
1557 |
new terms. This is off by default, you may like it or not...
|
1516 |
|
1558 |
|
1517 |
* Start with advanced search dialog open and Start with sort dialog
|
1559 |
* Start with advanced search dialog open : If you use this dialog
|
1518 |
open: If you use these dialogs all the time, checking these entries
|
1560 |
frequently, checking the entries will get it to open when recoll
|
1519 |
will get them to open when recoll starts.
|
1561 |
starts.
|
1520 |
|
1562 |
|
1521 |
* Remember sort activation state if set, Recoll will remember the sort
|
1563 |
* Remember sort activation state if set, Recoll will remember the sort
|
1522 |
tool stat between invocations. It normally starts with sorting
|
1564 |
tool stat between invocations. It normally starts with sorting
|
1523 |
disabled.
|
1565 |
disabled.
|
1524 |
|
1566 |
|
|
... |
|
... |
1533 |
|
1575 |
|
1534 |
* Edit result list paragraph format string: allows you to change the
|
1576 |
* Edit result list paragraph format string: allows you to change the
|
1535 |
presentation of each result list entry. See the result list
|
1577 |
presentation of each result list entry. See the result list
|
1536 |
customisation section.
|
1578 |
customisation section.
|
1537 |
|
1579 |
|
1538 |
* Edit result page html header insert: allows you to define text
|
1580 |
* Edit result page HTML header insert: allows you to define text
|
1539 |
inserted at the end of the result page html header. More detail in the
|
1581 |
inserted at the end of the result page HTML header. More detail in the
|
1540 |
result list customisation section.
|
1582 |
result list customisation section.
|
1541 |
|
1583 |
|
1542 |
* Date format: allows specifying the format used for displaying dates
|
1584 |
* Date format: allows specifying the format used for displaying dates
|
1543 |
inside the result list. This should be specified as an strftime()
|
1585 |
inside the result list. This should be specified as an strftime()
|
1544 |
string (man strftime).
|
1586 |
string (man strftime).
|
|
... |
|
... |
1574 |
* Replace abstracts from documents: this decides if we should synthesize
|
1616 |
* Replace abstracts from documents: this decides if we should synthesize
|
1575 |
and display an abstract in place of an explicit abstract found within
|
1617 |
and display an abstract in place of an explicit abstract found within
|
1576 |
the document itself.
|
1618 |
the document itself.
|
1577 |
|
1619 |
|
1578 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
1620 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
1579 |
document abstracts when displaying the result list. Abstracts are
|
1621 |
document abstracts (lists of snippets) when displaying the result
|
1580 |
constructed by taking context from the document information, around
|
1622 |
list. Abstracts are constructed by taking context from the document
|
1581 |
the search terms. This can slow down result list display significantly
|
1623 |
information, around the search terms.
|
1582 |
for big documents, and you may want to turn it off.
|
|
|
1583 |
|
1624 |
|
1584 |
* Synthetic abstract size: adjust to taste...
|
1625 |
* Synthetic abstract size: adjust to taste...
|
1585 |
|
1626 |
|
1586 |
* Synthetic abstract context words: how many words should be displayed
|
1627 |
* Synthetic abstract context words: how many words should be displayed
|
1587 |
around each term occurrence.
|
1628 |
around each term occurrence.
|
|
... |
|
... |
1613 |
The result list presentation can be exhaustively customized by adjusting
|
1654 |
The result list presentation can be exhaustively customized by adjusting
|
1614 |
two elements:
|
1655 |
two elements:
|
1615 |
|
1656 |
|
1616 |
* The paragraph format
|
1657 |
* The paragraph format
|
1617 |
|
1658 |
|
1618 |
* Html code inside the header section
|
1659 |
* HTML code inside the header section
|
1619 |
|
1660 |
|
1620 |
These can be edited from the Result list tab of the Query configuration.
|
1661 |
These can be edited from the Result list tab of the GUI configuration.
|
1621 |
|
1662 |
|
1622 |
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
1663 |
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
1623 |
(this may be disabled at build time), and total customisation is possible
|
1664 |
(this may be disabled at build time), and total customisation is possible
|
1624 |
with full support for CSS and Javascript. Conversely, there are limits to
|
1665 |
with full support for CSS and Javascript. Conversely, there are limits to
|
1625 |
what you can do with the older Qt QTextBrowser, but still, it is possible
|
1666 |
what you can do with the older Qt QTextBrowser, but still, it is possible
|
|
... |
|
... |
1641 |
|
1682 |
|
1642 |
* %A. Abstract
|
1683 |
* %A. Abstract
|
1643 |
|
1684 |
|
1644 |
* %D. Date
|
1685 |
* %D. Date
|
1645 |
|
1686 |
|
1646 |
* %E. Precooked Snippets link (will only appear for documents indexed
|
|
|
1647 |
with page numbers)
|
|
|
1648 |
|
|
|
1649 |
* %I. Icon image name. This is normally determined from the mime type.
|
1687 |
* %I. Icon image name. This is normally determined from the mime type.
|
1650 |
The associations are defined inside the mimeconf configuration file.
|
1688 |
The associations are defined inside the mimeconf configuration file.
|
1651 |
If a thumbnail for the file is found at the standard Freedesktop
|
1689 |
If a thumbnail for the file is found at the standard Freedesktop
|
1652 |
location, this will be displayed instead.
|
1690 |
location, this will be displayed instead.
|
1653 |
|
1691 |
|
1654 |
* %K. Keywords (if any)
|
1692 |
* %K. Keywords (if any)
|
1655 |
|
1693 |
|
1656 |
* %L. Precooked Preview and Edit links
|
1694 |
* %L. Precooked Preview, Edit, and possibly Snippets links
|
1657 |
|
1695 |
|
1658 |
* %M. Mime type
|
1696 |
* %M. Mime type
|
1659 |
|
1697 |
|
1660 |
* %N. result Number inside the result page
|
1698 |
* %N. result Number inside the result page
|
1661 |
|
1699 |
|
|
... |
|
... |
1667 |
|
1705 |
|
1668 |
* %t. Title or Filename if not set.
|
1706 |
* %t. Title or Filename if not set.
|
1669 |
|
1707 |
|
1670 |
* %U. Url
|
1708 |
* %U. Url
|
1671 |
|
1709 |
|
1672 |
The format of the Preview and Edit links is <a href="P%N"> and <a
|
1710 |
The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
|
1673 |
href="E%N"> where docnum (%N) expands to the document number inside the
|
1711 |
href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
|
1674 |
result page).
|
1712 |
number inside the result page).
|
1675 |
|
1713 |
|
1676 |
In addition to the predefined values above, all strings like %(fieldname)
|
1714 |
In addition to the predefined values above, all strings like %(fieldname)
|
1677 |
will be replaced by the value of the field named fieldname for this
|
1715 |
will be replaced by the value of the field named fieldname for this
|
1678 |
document. Only stored fields can be accessed in this way, the value of
|
1716 |
document. Only stored fields can be accessed in this way, the value of
|
1679 |
indexed but not stored fields is not known at this point in the search
|
1717 |
indexed but not stored fields is not known at this point in the search
|
|
... |
|
... |
1840 |
The query language processor is activated in the GUI simple search entry
|
1878 |
The query language processor is activated in the GUI simple search entry
|
1841 |
when the search mode selector is set to Query Language. It can also be
|
1879 |
when the search mode selector is set to Query Language. It can also be
|
1842 |
used with the KIO slave or the command line search. It broadly has the
|
1880 |
used with the KIO slave or the command line search. It broadly has the
|
1843 |
same capabilities as the complex search interface in the GUI.
|
1881 |
same capabilities as the complex search interface in the GUI.
|
1844 |
|
1882 |
|
1845 |
The language is roughly based on the (seemingly defunct) Xesam user search
|
1883 |
The language is based on the (seemingly defunct) Xesam user search
|
1846 |
language specification.
|
1884 |
language specification.
|
1847 |
|
1885 |
|
1848 |
If the results of a query language search puzzle you and you doubt what
|
1886 |
If the results of a query language search puzzle you and you doubt what
|
1849 |
has been actually searched for, you can use the GUI Show Query link at the
|
1887 |
has been actually searched for, you can use the GUI Show Query link at the
|
1850 |
top of the result list to check the exact query which was finally executed
|
1888 |
top of the result list to check the exact query which was finally executed
|
|
... |
|
... |
1860 |
ie: the From: header, for an email message), and containing either beatles
|
1898 |
ie: the From: header, for an email message), and containing either beatles
|
1861 |
or lennon and either live or unplugged but not potatoes (in any part of
|
1899 |
or lennon and either live or unplugged but not potatoes (in any part of
|
1862 |
the document).
|
1900 |
the document).
|
1863 |
|
1901 |
|
1864 |
An element is composed of an optional field specification, and a value,
|
1902 |
An element is composed of an optional field specification, and a value,
|
|
|
1903 |
separated by a colon (the field separator is the last colon in the
|
1865 |
separated by a colon. Example: Beatles, author:balzac, dc:title:grandet
|
1904 |
element). Example: Eugenie, author:balzac, dc:title:grandet
|
1866 |
|
1905 |
|
1867 |
The colon, if present, means "contains". Xesam defines other relations,
|
1906 |
The colon, if present, means "contains". Xesam defines other relations,
|
1868 |
which are not supported for now.
|
1907 |
which are mostly supported for now (except in special cases, described
|
|
|
1908 |
further down).
|
1869 |
|
1909 |
|
1870 |
All elements in the search entry are normally combined with an implicit
|
1910 |
All elements in the search entry are normally combined with an implicit
|
1871 |
AND. It is possible to specify that elements be OR'ed instead, as in
|
1911 |
AND. It is possible to specify that elements be OR'ed instead, as in
|
1872 |
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
1912 |
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
1873 |
priority over the AND associations: word1 word2 OR word3 means word1 AND
|
1913 |
priority over the AND associations: word1 word2 OR word3 means word1 AND
|
1874 |
(word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
|
1914 |
(word2 OR word3) not (word1 AND word2) OR word3. Explicit parenthesis are
|
1875 |
parenthesis, they are not supported for now.
|
1915 |
not supported.
|
1876 |
|
1916 |
|
1877 |
An element preceded by a - specifies a term that should not appear. Pure
|
1917 |
An element preceded by a - specifies a term that should not appear. Pure
|
1878 |
negative queries are forbidden.
|
1918 |
negative queries are forbidden.
|
1879 |
|
1919 |
|
1880 |
As usual, words inside quotes define a phrase (the order of words is
|
1920 |
As usual, words inside quotes define a phrase (the order of words is
|
|
... |
|
... |
2101 |
|
2141 |
|
2102 |
* Using a wildcard character at the beginning of a word can make for a
|
2142 |
* Using a wildcard character at the beginning of a word can make for a
|
2103 |
slow search because Recoll will have to scan the whole index term list
|
2143 |
slow search because Recoll will have to scan the whole index term list
|
2104 |
to find the matches.
|
2144 |
to find the matches.
|
2105 |
|
2145 |
|
|
|
2146 |
* When working with a raw index (preserving character case and
|
|
|
2147 |
diacritics), the literal part of a wildcard expression will be matched
|
|
|
2148 |
exactly for case and diacritics.
|
|
|
2149 |
|
2106 |
* Using a * at the end of a word can produce more matches than you would
|
2150 |
* Using a * at the end of a word can produce more matches than you would
|
2107 |
think, and strange search results. You can use the term explorer tool
|
2151 |
think, and strange search results. You can use the term explorer tool
|
2108 |
to check what completions exist for a given term. You can also see
|
2152 |
to check what completions exist for a given term. You can also see
|
2109 |
exactly what search was performed by clicking on the link at the top
|
2153 |
exactly what search was performed by clicking on the link at the top
|
2110 |
of the result list. In general, for natural language terms, stem
|
2154 |
of the result list. In general, for natural language terms, stem
|
|
... |
|
... |
2134 |
This feature can also be used with an actual phrase search, but in this
|
2178 |
This feature can also be used with an actual phrase search, but in this
|
2135 |
case, the distance applies to the whole phrase and anchor, so that, for
|
2179 |
case, the distance applies to the whole phrase and anchor, so that, for
|
2136 |
example, bla bla my unexpected term at the beginning of the text would be
|
2180 |
example, bla bla my unexpected term at the beginning of the text would be
|
2137 |
a match for "^my term"o5.
|
2181 |
a match for "^my term"o5.
|
2138 |
|
2182 |
|
|
|
2183 |
Anchored searches can be very useful for searches inside somewhat
|
|
|
2184 |
structured documents like scientific articles, in case explicit metadata
|
|
|
2185 |
has not been supplied (a most frequent case), for example for looking for
|
|
|
2186 |
matches inside the abstract or the list of authors (which occur at the top
|
|
|
2187 |
of the document).
|
|
|
2188 |
|
2139 |
----------------------------------------------------------------------
|
2189 |
----------------------------------------------------------------------
|
2140 |
|
2190 |
|
2141 |
3.7. Desktop integration
|
2191 |
3.7. Desktop integration
|
2142 |
|
2192 |
|
2143 |
Being independant of the desktop type has its drawbacks: Recoll desktop
|
2193 |
Being independant of the desktop type has its drawbacks: Recoll desktop
|
2144 |
integration is minimal. Here follow a few things that may help.
|
2194 |
integration is minimal. However there are a few tools available:
|
|
|
2195 |
|
|
|
2196 |
* The KDE KIO Slave was described in a previous section.
|
|
|
2197 |
|
|
|
2198 |
* If you use a recent version of Ubuntu Linux, you may find the Ubuntu
|
|
|
2199 |
Unity Lens module useful.
|
|
|
2200 |
|
|
|
2201 |
* There is also an independantly developed Krunner plugin.
|
|
|
2202 |
|
|
|
2203 |
Here follow a few other things that may help.
|
2145 |
|
2204 |
|
2146 |
----------------------------------------------------------------------
|
2205 |
----------------------------------------------------------------------
|
2147 |
|
2206 |
|
2148 |
3.7.1. Hotkeying recoll
|
2207 |
3.7.1. Hotkeying recoll
|
2149 |
|
2208 |
|
|
... |
|
... |
2153 |
just this. The detailed instructions are on this wiki page.
|
2212 |
just this. The detailed instructions are on this wiki page.
|
2154 |
|
2213 |
|
2155 |
----------------------------------------------------------------------
|
2214 |
----------------------------------------------------------------------
|
2156 |
|
2215 |
|
2157 |
3.7.2. The KDE Kicker Recoll applet
|
2216 |
3.7.2. The KDE Kicker Recoll applet
|
|
|
2217 |
|
|
|
2218 |
This is probably obsolete now. Anyway:
|
2158 |
|
2219 |
|
2159 |
The Recoll source tree contains the source code to the recoll_applet, a
|
2220 |
The Recoll source tree contains the source code to the recoll_applet, a
|
2160 |
small application derived from the find_applet. This can be used to add a
|
2221 |
small application derived from the find_applet. This can be used to add a
|
2161 |
small Recoll launcher to the KDE panel.
|
2222 |
small Recoll launcher to the KDE panel.
|
2162 |
|
2223 |
|
|
... |
|
... |
2175 |
a new recoll GUI instance every time (even if it is already running). You
|
2236 |
a new recoll GUI instance every time (even if it is already running). You
|
2176 |
may find it useful anyway.
|
2237 |
may find it useful anyway.
|
2177 |
|
2238 |
|
2178 |
----------------------------------------------------------------------
|
2239 |
----------------------------------------------------------------------
|
2179 |
|
2240 |
|
2180 |
3.8. Multiple databases
|
|
|
2181 |
|
|
|
2182 |
Multiple Recoll databases or indexes can be created by using several
|
|
|
2183 |
configuration directories which are usually set to index different areas
|
|
|
2184 |
of the file system. A specific index can be selected for updating or
|
|
|
2185 |
searching, using the RECOLL_CONFDIR environment variable or the -c option
|
|
|
2186 |
to recoll and recollindex.
|
|
|
2187 |
|
|
|
2188 |
A typical usage scenario for the multiple index feature would be for a
|
|
|
2189 |
system administrator to set up a central index for shared data, that you
|
|
|
2190 |
choose to search or not in addition to your personal data. Of course,
|
|
|
2191 |
there are other possibilities. There are many cases where you know the
|
|
|
2192 |
subset of files that should be searched, and where narrowing the search
|
|
|
2193 |
can improve the results. You can achieve approximately the same effect
|
|
|
2194 |
with the directory filter in advanced search, but multiple indexes will
|
|
|
2195 |
have much better performance and may be worth the trouble.
|
|
|
2196 |
|
|
|
2197 |
A recollindex program instance can only update one specific index.
|
|
|
2198 |
|
|
|
2199 |
The main index (defined by RECOLL_CONFDIR or -c) is always active. If this
|
|
|
2200 |
is undesirable, you can set up your base configuration to index an empty
|
|
|
2201 |
directory.
|
|
|
2202 |
|
|
|
2203 |
The different search interfaces (GUI, command line, ...) have different
|
|
|
2204 |
methods to define the set of indexes to be used, see the appropriate
|
|
|
2205 |
section.
|
|
|
2206 |
|
|
|
2207 |
If a set of multiple indexes are to be used together for searches, some
|
|
|
2208 |
configuration parameters must be consistent among the set. These are
|
|
|
2209 |
parameters which need to be the same when indexing and searching. As the
|
|
|
2210 |
parameters come from the main configuration when searching, they need to
|
|
|
2211 |
be compatible with what was set when creating the other indexes (which
|
|
|
2212 |
came from their respective configuration directories. Most of the relevant
|
|
|
2213 |
parameters are described in the following linked section.
|
|
|
2214 |
|
|
|
2215 |
----------------------------------------------------------------------
|
|
|
2216 |
|
|
|
2217 |
Chapter 4. Programming interface
|
2241 |
Chapter 4. Programming interface
|
2218 |
|
2242 |
|
2219 |
Recoll has an Application programming Interface, usable both for indexing
|
2243 |
Recoll has an Application Programming Interface, usable both for indexing
|
2220 |
and searching, currently accessible from the Python language.
|
2244 |
and searching, currently accessible from the Python language.
|
2221 |
|
2245 |
|
2222 |
Another less radical way to extend the application is to write filters for
|
2246 |
Another less radical way to extend the application is to write filters for
|
2223 |
new types of documents.
|
2247 |
new types of documents.
|
2224 |
|
2248 |
|
|
... |
|
... |
2235 |
|
2259 |
|
2236 |
As of Recoll 1.13, there are two kinds of filters:
|
2260 |
As of Recoll 1.13, there are two kinds of filters:
|
2237 |
|
2261 |
|
2238 |
* Simple filters (the old ones) run once and exit. They can be bare
|
2262 |
* Simple filters (the old ones) run once and exit. They can be bare
|
2239 |
programs like antiword, or shell-scripts using other programs. They
|
2263 |
programs like antiword, or shell-scripts using other programs. They
|
2240 |
are very simple to write, just having to write the text to the
|
2264 |
are very simple to write, because they just need to output the
|
2241 |
standard output.
|
2265 |
converted to the standard output.
|
2242 |
|
2266 |
|
2243 |
* Multiple filters, new in 1.13, run as long as their master process
|
2267 |
* Multiple filters, new in 1.13, run as long as their master process
|
2244 |
(ie: recollindex) is active. They can process multiple files (sparing
|
2268 |
(ie: recollindex) is active. They can process multiple files (sparing
|
2245 |
the process startup time which can be very significant), or multiple
|
2269 |
the process startup time which can be very significant), or multiple
|
2246 |
documents per file (ie: for zip or chm files). They communicate with
|
2270 |
documents per file (ie: for zip or chm files). They communicate with
|
|
... |
|
... |
2268 |
|
2292 |
|
2269 |
Filters are called with a single argument which is the source file name.
|
2293 |
Filters are called with a single argument which is the source file name.
|
2270 |
They should output the result to stdout.
|
2294 |
They should output the result to stdout.
|
2271 |
|
2295 |
|
2272 |
When writing a filter, you should decide if it will output plain text or
|
2296 |
When writing a filter, you should decide if it will output plain text or
|
2273 |
html. Plain text is simpler, but you will not be able to add metadata or
|
2297 |
HTML. Plain text is simpler, but you will not be able to add metadata or
|
2274 |
vary the output character encoding (this will be defined in a
|
2298 |
vary the output character encoding (this will be defined in a
|
2275 |
configuration file). Additionally, some formatting may easier to preserve
|
2299 |
configuration file). Additionally, some formatting may be easier to
|
2276 |
when previewing html. Actually the deciding factor is metadata: Recoll has
|
2300 |
preserve when previewing HTML. Actually the deciding factor is metadata:
|
2277 |
a way to extract metadata from the html header and use it for field
|
2301 |
Recoll has a way to extract metadata from the HTML header and use it for
|
2278 |
searches..
|
2302 |
field searches..
|
2279 |
|
2303 |
|
2280 |
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
2304 |
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
2281 |
the filter if the operation is for indexing or previewing. Some filters
|
2305 |
the filter if the operation is for indexing or previewing. Some filters
|
2282 |
use this to output a slightly different format, for example stripping
|
2306 |
use this to output a slightly different format, for example stripping
|
2283 |
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
2307 |
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
|
... |
|
... |
2349 |
|
2373 |
|
2350 |
You should take care to escape some characters inside the text by
|
2374 |
You should take care to escape some characters inside the text by
|
2351 |
transforming them into appropriate entities. "&" should be transformed
|
2375 |
transforming them into appropriate entities. "&" should be transformed
|
2352 |
into "&", "<" should be transformed into "<". This is not always
|
2376 |
into "&", "<" should be transformed into "<". This is not always
|
2353 |
properly done by translating programs which output HTML, and of course
|
2377 |
properly done by translating programs which output HTML, and of course
|
2354 |
nerver by those which output plain text.
|
2378 |
never by those which output plain text.
|
2355 |
|
2379 |
|
2356 |
The character set needs to be specified in the header. It does not need to
|
2380 |
The character set needs to be specified in the header. It does not need to
|
2357 |
be UTF-8 (Recoll will take care of translating it), but it must be
|
2381 |
be UTF-8 (Recoll will take care of translating it), but it must be
|
2358 |
accurate for good results.
|
2382 |
accurate for good results.
|
2359 |
|
2383 |
|
|
... |
|
... |
2405 |
results.
|
2429 |
results.
|
2406 |
|
2430 |
|
2407 |
A field can be either or both indexed and stored. This and other aspects
|
2431 |
A field can be either or both indexed and stored. This and other aspects
|
2408 |
of fields handling is defined inside the fields configuration file.
|
2432 |
of fields handling is defined inside the fields configuration file.
|
2409 |
|
2433 |
|
|
|
2434 |
The sequence of events for field processing is as follows:
|
|
|
2435 |
|
|
|
2436 |
* During indexing, recollindex scans all meta fields in HTML documents
|
|
|
2437 |
(most document types are transformed into HTML at some point). It
|
|
|
2438 |
compares the name for each element to the configuration defining what
|
|
|
2439 |
should be done with fields (the fields file)
|
|
|
2440 |
|
|
|
2441 |
* If the name for the meta element matches one for a field that should
|
|
|
2442 |
be indexed, the contents are processed and the terms are entered into
|
|
|
2443 |
the index with the prefix defined in the fields file.
|
|
|
2444 |
|
|
|
2445 |
* If the name for the meta element matches one for a field that should
|
|
|
2446 |
be stored, the content of the element is stored with the document data
|
|
|
2447 |
record, from which it can be extracted and displayed at query time.
|
|
|
2448 |
|
|
|
2449 |
* At query time, if a field search is performed, the index prefix is
|
|
|
2450 |
computed and the match is only performed against appropriately
|
|
|
2451 |
prefixed terms in the index.
|
|
|
2452 |
|
|
|
2453 |
* At query time, the field can be displayed inside the result list by
|
|
|
2454 |
using the appropriate directive in the definition of the result list
|
|
|
2455 |
paragraph format. All fields are displayed on the fields screen of the
|
|
|
2456 |
preview window (which you can reach through the right-click menu).
|
|
|
2457 |
This is independant of the fact that the search which produced the
|
|
|
2458 |
results used the field or not.
|
|
|
2459 |
|
2410 |
You can find more information in the section about the fields file, or in
|
2460 |
You can find more information in the section about the fields file, or in
|
2411 |
comments inside the file.
|
2461 |
comments inside the file.
|
|
|
2462 |
|
|
|
2463 |
You can also have a look at the example on the Wiki, detailing how one
|
|
|
2464 |
could add a page count field to pdf documents for displaying inside result
|
|
|
2465 |
lists.
|
2412 |
|
2466 |
|
2413 |
----------------------------------------------------------------------
|
2467 |
----------------------------------------------------------------------
|
2414 |
|
2468 |
|
2415 |
4.3. API
|
2469 |
4.3. API
|
2416 |
|
2470 |
|
|
... |
|
... |
2460 |
4.3.2.1. Introduction
|
2514 |
4.3.2.1. Introduction
|
2461 |
|
2515 |
|
2462 |
Recoll versions after 1.11 define a Python programming interface, both for
|
2516 |
Recoll versions after 1.11 define a Python programming interface, both for
|
2463 |
searching and indexing.
|
2517 |
searching and indexing.
|
2464 |
|
2518 |
|
2465 |
The Python interface is not built by default and can be found in the
|
2519 |
The Python interface can be found in the source package, under
|
2466 |
source package, under python/recoll.
|
2520 |
python/recoll.
|
2467 |
|
2521 |
|
2468 |
In order to build the module, you should first build or re-build the
|
2522 |
In order to build the module, you should first build or re-build the
|
2469 |
Recoll library using position-independant objects:
|
2523 |
Recoll library using position-independant objects:
|
2470 |
|
2524 |
|
2471 |
cd recoll-xxx/
|
2525 |
cd recoll-xxx/
|
|
... |
|
... |
3311 |
|
3365 |
|
3312 |
|
3366 |
|
3313 |
Note that the translation is not limited to a single character,
|
3367 |
Note that the translation is not limited to a single character,
|
3314 |
you could very well have something like u:ue in the list.
|
3368 |
you could very well have something like u:ue in the list.
|
3315 |
|
3369 |
|
|
|
3370 |
The default value set for unac_except_trans can't be listed here
|
|
|
3371 |
because I have trouble with SGML and UTF-8, but it only contains
|
|
|
3372 |
ligature decompositions: german ss, oe, ae, fi, fl.
|
|
|
3373 |
|
3316 |
This parameter can't be defined for subdirectories, it is global,
|
3374 |
This parameter can't be defined for subdirectories, it is global,
|
3317 |
because there is no way to do otherwise when querying. If you have
|
3375 |
because there is no way to do otherwise when querying. If you have
|
3318 |
document sets which would need different values, you will have to
|
3376 |
document sets which would need different values, you will have to
|
3319 |
index and query them separately.
|
3377 |
index and query them separately.
|
3320 |
|
3378 |
|