|
a/src/README |
|
b/src/README |
|
... |
|
... |
44 |
|
44 |
|
45 |
2.2.1. Xapian index formats
|
45 |
2.2.1. Xapian index formats
|
46 |
|
46 |
|
47 |
2.2.2. Security aspects
|
47 |
2.2.2. Security aspects
|
48 |
|
48 |
|
49 |
2.3. Indexing configuration
|
49 |
2.3. Index configuration
|
50 |
|
50 |
|
|
|
51 |
2.3.1. Index case and diacritics sensitivity
|
|
|
52 |
|
51 |
2.3.1. The indexing configuration GUI
|
53 |
2.3.2. The index configuration GUI
|
52 |
|
54 |
|
53 |
2.4. Using Beagle WEB browser plugins
|
55 |
2.4. Using Beagle WEB browser plugins
|
54 |
|
56 |
|
55 |
2.5. Periodic indexing
|
57 |
2.5. Periodic indexing
|
56 |
|
58 |
|
|
... |
|
... |
100 |
|
102 |
|
101 |
3.4. The query language
|
103 |
3.4. The query language
|
102 |
|
104 |
|
103 |
3.4.1. Modifiers
|
105 |
3.4.1. Modifiers
|
104 |
|
106 |
|
|
|
107 |
3.5. Search case and diacritics sensitivity
|
|
|
108 |
|
105 |
3.5. Anchored searches and wildcards
|
109 |
3.6. Anchored searches and wildcards
|
106 |
|
110 |
|
107 |
3.5.1. More about wildcards
|
111 |
3.6.1. More about wildcards
|
108 |
|
112 |
|
109 |
3.5.2. Anchored searches
|
113 |
3.6.2. Anchored searches
|
110 |
|
114 |
|
111 |
3.6. Desktop integration
|
115 |
3.7. Desktop integration
|
112 |
|
116 |
|
113 |
3.6.1. Hotkeying recoll
|
117 |
3.7.1. Hotkeying recoll
|
114 |
|
118 |
|
115 |
3.6.2. The KDE Kicker Recoll applet
|
119 |
3.7.2. The KDE Kicker Recoll applet
|
116 |
|
120 |
|
117 |
3.7. Multiple databases
|
121 |
3.8. Multiple databases
|
118 |
|
122 |
|
119 |
4. Programming interface
|
123 |
4. Programming interface
|
120 |
|
124 |
|
121 |
4.1. Writing a document filter
|
125 |
4.1. Writing a document filter
|
122 |
|
126 |
|
123 |
4.1.1. Simple filters
|
127 |
4.1.1. Simple filters
|
124 |
|
128 |
|
125 |
4.1.2. Telling Recoll about the filter
|
129 |
4.1.2. Telling Recoll about the filter
|
126 |
|
130 |
|
127 |
4.1.3. Filter HTML output
|
131 |
4.1.3. Filter HTML output
|
|
|
132 |
|
|
|
133 |
4.1.4. Page numbers
|
128 |
|
134 |
|
129 |
4.2. Field data processing
|
135 |
4.2. Field data processing
|
130 |
|
136 |
|
131 |
4.3. API
|
137 |
4.3. API
|
132 |
|
138 |
|
|
... |
|
... |
248 |
Stemming is the process by which Recoll reduces words to their radicals so
|
254 |
Stemming is the process by which Recoll reduces words to their radicals so
|
249 |
that searching does not depend, for example, on a word being singular or
|
255 |
that searching does not depend, for example, on a word being singular or
|
250 |
plural (floor, floors), or on a verb tense (flooring, floored). Because
|
256 |
plural (floor, floors), or on a verb tense (flooring, floored). Because
|
251 |
the mechanisms used for stemming depend on the specific grammatical rules
|
257 |
the mechanisms used for stemming depend on the specific grammatical rules
|
252 |
for each language, there is a separate stemmer module for most common
|
258 |
for each language, there is a separate stemmer module for most common
|
253 |
languages where stemming makes sense. Storing documents written in
|
259 |
languages where stemming makes sense.
|
254 |
different languages in the same index is possible, and commonly done. In
|
260 |
|
255 |
this situation, you can specify several stemming languages for the index.
|
|
|
256 |
Recoll stores the unstemmed versions of terms in the main index and uses
|
261 |
Recoll stores the unstemmed versions of terms in the main index and uses
|
257 |
auxiliary databases for term expansion (one for each stemming language),
|
262 |
auxiliary databases for term expansion (one for each stemming language),
|
258 |
which means that you can switch stemming languages between searches, or
|
263 |
which means that you can switch stemming languages between searches, or
|
259 |
add a language without needing a full reindex. Recoll currently makes no
|
264 |
add a language without needing a full reindex.
|
260 |
attempt at automatic language recognition, which means that the stemmer
|
265 |
|
261 |
will sometimes be applied to terms from other languages with potentially
|
266 |
Storing documents written in different languages in the same index is
|
262 |
strange results. In practise, even if this introduces possibilities of
|
267 |
possible, and commonly done. In this situation, you can specify several
|
263 |
confusion, this approach has been proven quite useful, and, awaiting the
|
268 |
stemming languages for the index.
|
264 |
addition of an automatic language recognition module to Recoll, it is much
|
269 |
|
265 |
less cumbersome than separating your documents according to what language
|
270 |
Recoll currently makes no attempt at automatic language recognition, which
|
266 |
they are written in.
|
271 |
means that the stemmer will sometimes be applied to terms from other
|
|
|
272 |
languages with potentially strange results. In practise, even if this
|
|
|
273 |
introduces possibilities of confusion, this approach has been proven quite
|
|
|
274 |
useful, and, awaiting the addition of an automatic language recognition
|
|
|
275 |
module to Recoll, it is much less cumbersome than separating your
|
|
|
276 |
documents according to what language they are written in.
|
|
|
277 |
|
|
|
278 |
Before version 1.18, Recoll always stripped most accents and diacritics
|
|
|
279 |
from terms, and converted them to lower case before storing them in the
|
|
|
280 |
index. As a consequence, it was impossible to search for a particular
|
|
|
281 |
capitalization of a term (US / us), or to discriminate two terms based on
|
|
|
282 |
diacritics (sake / sake, mate / mate).
|
|
|
283 |
|
|
|
284 |
As of version 1.18, Recoll can optionally store the raw terms, without
|
|
|
285 |
accent stripping or case conversion. Expansions necessary for searches
|
|
|
286 |
insensitive to case and/or diacritics are then performed when searching.
|
|
|
287 |
This is described in more detail in the section about index case and
|
|
|
288 |
diacritics sensitivity.
|
267 |
|
289 |
|
268 |
Recoll has many parameters which define exactly what to index, and how to
|
290 |
Recoll has many parameters which define exactly what to index, and how to
|
269 |
classify and decode the source documents. These are kept in configuration
|
291 |
classify and decode the source documents. These are kept in configuration
|
270 |
files. A default configuration is copied into a standard location (usually
|
292 |
files. A default configuration is copied into a standard location (usually
|
271 |
something like /usr/[local/]share/recoll/examples) during installation.
|
293 |
something like /usr/[local/]share/recoll/examples) during installation.
|
|
... |
|
... |
350 |
search precision.
|
372 |
search precision.
|
351 |
|
373 |
|
352 |
The generated indexes can be queried concurrently in a transparent manner.
|
374 |
The generated indexes can be queried concurrently in a transparent manner.
|
353 |
|
375 |
|
354 |
For index generation, multiple configurations are totally independant from
|
376 |
For index generation, multiple configurations are totally independant from
|
355 |
each other. When multiple indexes are used for searches, some parameters
|
377 |
each other. When multiple indexes need to be used for a single search,
|
356 |
should be consistent among the configurations.
|
378 |
some parameters should be consistent among the configurations.
|
357 |
|
379 |
|
358 |
----------------------------------------------------------------------
|
380 |
----------------------------------------------------------------------
|
359 |
|
381 |
|
360 |
2.1.3. Document types
|
382 |
2.1.3. Document types
|
361 |
|
383 |
|
|
... |
|
... |
478 |
need for your index, set the directory and files access modes
|
500 |
need for your index, set the directory and files access modes
|
479 |
appropriately, and also maybe adjust the umask used during index updates.
|
501 |
appropriately, and also maybe adjust the umask used during index updates.
|
480 |
|
502 |
|
481 |
----------------------------------------------------------------------
|
503 |
----------------------------------------------------------------------
|
482 |
|
504 |
|
483 |
2.3. Indexing configuration
|
505 |
2.3. Index configuration
|
484 |
|
506 |
|
485 |
Variables set inside the Recoll configuration files control which areas of
|
507 |
Variables set inside the Recoll configuration files control which areas of
|
486 |
the file system are indexed, and how files are processed. These variables
|
508 |
the file system are indexed, and how files are processed. These variables
|
487 |
can be set either by editing the text files or using the dialogs in the
|
509 |
can be set either by editing the text files or using the dialogs in the
|
488 |
recoll GUI.
|
510 |
recoll GUI.
|
|
... |
|
... |
504 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
526 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
505 |
section.
|
527 |
section.
|
506 |
|
528 |
|
507 |
----------------------------------------------------------------------
|
529 |
----------------------------------------------------------------------
|
508 |
|
530 |
|
|
|
531 |
2.3.1. Index case and diacritics sensitivity
|
|
|
532 |
|
|
|
533 |
As of Recoll version 1.18 you have a choice of building an index with
|
|
|
534 |
terms stripped of character case and diacritics, or one with raw terms.
|
|
|
535 |
For a source term of Resume, the former will store resume, the latter
|
|
|
536 |
Resume.
|
|
|
537 |
|
|
|
538 |
Each type of index allows performing searches insensitive to case and
|
|
|
539 |
diacritics: with a raw index, the user entry will be expanded to match all
|
|
|
540 |
case and diacritics variations present in the index. With a stripped
|
|
|
541 |
index, the search term will be stripped before searching.
|
|
|
542 |
|
|
|
543 |
A raw index allows for another possibility which a stripped index cannot
|
|
|
544 |
offer: using case and diacritics to discriminate between terms, returning
|
|
|
545 |
different results when searching for US and us or resume and resume. Read
|
|
|
546 |
the section about search case and diacritics sensitivity for more details.
|
|
|
547 |
|
|
|
548 |
The type of index to be created is controlled by the indexStripChars
|
|
|
549 |
configuration variable which can only be changed by editing the
|
|
|
550 |
configuration file. Any change implies an index reset (not automated by
|
|
|
551 |
Recoll), and all indexes in a search must be set in the same way (again,
|
|
|
552 |
not checked by Recoll).
|
|
|
553 |
|
|
|
554 |
If the indexStripChars is not set, Recoll 1.18 creates a stripped index by
|
|
|
555 |
default, for compatibility with previous versions.
|
|
|
556 |
|
|
|
557 |
As a cost for added capability, a raw index will be slightly bigger than a
|
|
|
558 |
stripped one (around 10%). Also, searches will be more complex, so
|
|
|
559 |
probably slightly slower, and the feature is still young, and a certain
|
|
|
560 |
amount of weirdness cannot be excluded.
|
|
|
561 |
|
|
|
562 |
----------------------------------------------------------------------
|
|
|
563 |
|
509 |
2.3.1. The indexing configuration GUI
|
564 |
2.3.2. The index configuration GUI
|
510 |
|
565 |
|
511 |
Most parameters for a given indexing configuration can be set from a
|
566 |
Most parameters for a given index configuration can be set from a recoll
|
512 |
recoll GUI running on this configuration (either as default, or by setting
|
567 |
GUI running on this configuration (either as default, or by setting
|
513 |
RECOLL_CONFDIR or the -c option.)
|
568 |
RECOLL_CONFDIR or the -c option.)
|
514 |
|
569 |
|
515 |
The interface is started from the Preferences->Indexing Configuration menu
|
570 |
The interface is started from the Preferences->Index Configuration menu
|
516 |
entry. It is divided in three tabs, Global parameters, Local parameters,
|
571 |
entry. It is divided in four tabs, Global parameters, Local parameters,
|
517 |
and Beagle web history, which is explained in the next section.
|
572 |
Beagle web history (which is explained in the next section) and Search
|
|
|
573 |
parameters.
|
518 |
|
574 |
|
519 |
The first tab allows setting global variables, like the lists of top
|
575 |
The Global parameters tab allows setting global variables, like the lists
|
520 |
directories, skipped paths, or stemming languages.
|
576 |
of top directories, skipped paths, or stemming languages.
|
521 |
|
577 |
|
522 |
The second tab allows setting variables that can be redefined for
|
578 |
The Local parameters tab allows setting variables that can be redefined
|
523 |
subdirectories. This second tab has an initially empty list of
|
579 |
for subdirectories. This second tab has an initially empty list of
|
524 |
customisation directories, to which you can add. The variables are then
|
580 |
customisation directories, to which you can add. The variables are then
|
525 |
set for the currently selected directory (or at the top level if the empty
|
581 |
set for the currently selected directory (or at the top level if the empty
|
526 |
line is selected).
|
582 |
line is selected).
|
|
|
583 |
|
|
|
584 |
The Search parameters section defines parameters which are used at query
|
|
|
585 |
time, but are global to an index and affect all search tools, not only the
|
|
|
586 |
GUI.
|
527 |
|
587 |
|
528 |
The meaning for most entries in the interface is self-evident and
|
588 |
The meaning for most entries in the interface is self-evident and
|
529 |
documented by a ToolTip popup on the text label. For more detail, you will
|
589 |
documented by a ToolTip popup on the text label. For more detail, you will
|
530 |
need to refer to the configuration section of this guide.
|
590 |
need to refer to the configuration section of this guide.
|
531 |
|
591 |
|
|
... |
|
... |
548 |
still use the Firefox plugin, which is written in Javascript and
|
608 |
still use the Firefox plugin, which is written in Javascript and
|
549 |
completely independant of C#, Beagle, Lucene..., and set Recoll to process
|
609 |
completely independant of C#, Beagle, Lucene..., and set Recoll to process
|
550 |
the Beagle queue directory. This supposes that Beagle is not running, else
|
610 |
the Beagle queue directory. This supposes that Beagle is not running, else
|
551 |
both programs will fight for the same files.
|
611 |
both programs will fight for the same files.
|
552 |
|
612 |
|
553 |
This feature can be enabled in the GUI indexing configuration panel, or by
|
613 |
This feature can be enabled in the GUI Index configuration panel, or by
|
554 |
editing the configuration file (set processbeaglequeue to 1).
|
614 |
editing the configuration file (set processbeaglequeue to 1).
|
555 |
|
615 |
|
556 |
There are more recent instructions about how to find and install the
|
616 |
There are more recent instructions about how to find and install the
|
557 |
Firefox extension on the Recoll wiki.
|
617 |
Firefox extension on the Recoll wiki.
|
558 |
|
618 |
|
|
... |
|
... |
853 |
single preview window by typing Shift+ArrowUp/Down in the window).
|
913 |
single preview window by typing Shift+ArrowUp/Down in the window).
|
854 |
|
914 |
|
855 |
Clicking the Open link will attempt to start an external viewer. The
|
915 |
Clicking the Open link will attempt to start an external viewer. The
|
856 |
viewer for each document type can be configured through the user
|
916 |
viewer for each document type can be configured through the user
|
857 |
preferences dialog, or by editing the mimeview configuration file. You can
|
917 |
preferences dialog, or by editing the mimeview configuration file. You can
|
858 |
also check the Use desktop preferences option in the user preferences
|
918 |
also check the Use desktop preferences option in the GUI preferences
|
859 |
dialog to use the desktop defaults for all documents. This is probably the
|
919 |
dialog to use the desktop defaults for all documents. This is probably the
|
860 |
best option if you are using a well configured Gnome or KDE desktop.
|
920 |
best option if you are using a well configured Gnome or KDE desktop.
|
861 |
|
921 |
|
862 |
The Preview and Open edit links may not be present for all entries,
|
922 |
The Preview and Open edit links may not be present for all entries,
|
863 |
meaning that Recoll has no configured way to preview a given file type
|
923 |
meaning that Recoll has no configured way to preview a given file type
|
|
... |
|
... |
901 |
* Find similar
|
961 |
* Find similar
|
902 |
|
962 |
|
903 |
* Preview Parent document
|
963 |
* Preview Parent document
|
904 |
|
964 |
|
905 |
* Open Parent document
|
965 |
* Open Parent document
|
|
|
966 |
|
|
|
967 |
* Open Snippets Window
|
906 |
|
968 |
|
907 |
The Preview and Open entries do the same thing as the corresponding links.
|
969 |
The Preview and Open entries do the same thing as the corresponding links.
|
908 |
|
970 |
|
909 |
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
971 |
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
910 |
for later pasting.
|
972 |
for later pasting.
|
|
... |
|
... |
927 |
appear for an email which is part of an mbox folder file, but that you
|
989 |
appear for an email which is part of an mbox folder file, but that you
|
928 |
can't actually visualize the folder (there will be an error dialog if you
|
990 |
can't actually visualize the folder (there will be an error dialog if you
|
929 |
try). Recoll is unfortunately not yet smart enough to disable the entry in
|
991 |
try). Recoll is unfortunately not yet smart enough to disable the entry in
|
930 |
this case. In other cases, the Open option makes sense, for example to
|
992 |
this case. In other cases, the Open option makes sense, for example to
|
931 |
start a chm viewer on the parent document for a help page.
|
993 |
start a chm viewer on the parent document for a help page.
|
|
|
994 |
|
|
|
995 |
The Open Snippets Window entry will only appear for documents which
|
|
|
996 |
support page breaks (typically PDF, Postscript, DVI). The snippets window
|
|
|
997 |
lists extracts from the document, taken around search terms occurrences,
|
|
|
998 |
along with the corresponding page number, as links which can be used to
|
|
|
999 |
start the native viewer on the appropriate page. If the viewer supports
|
|
|
1000 |
it, its search function will also be primed with one of the search terms.
|
932 |
|
1001 |
|
933 |
----------------------------------------------------------------------
|
1002 |
----------------------------------------------------------------------
|
934 |
|
1003 |
|
935 |
3.1.3. The result table
|
1004 |
3.1.3. The result table
|
936 |
|
1005 |
|
|
... |
|
... |
1426 |
the xdg-open utility will be used to open files when you click the
|
1495 |
the xdg-open utility will be used to open files when you click the
|
1427 |
Open link in the result list, instead of the application defined in
|
1496 |
Open link in the result list, instead of the application defined in
|
1428 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
1497 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
1429 |
an appropriate application.
|
1498 |
an appropriate application.
|
1430 |
|
1499 |
|
|
|
1500 |
* Exceptions: when using the desktop preferences for opening documents,
|
|
|
1501 |
these are mime types that will still be opened according to Recoll
|
|
|
1502 |
preferences. This is useful for passing parameters like page numbers
|
|
|
1503 |
or search strings to applications that support them (e.g. evince).
|
|
|
1504 |
|
1431 |
* Choose editor applications this will let you choose the command
|
1505 |
* Choose editor applications this will let you choose the command
|
1432 |
started by the Open links inside the result list, for specific
|
1506 |
started by the Open links inside the result list, for specific
|
1433 |
document types.
|
1507 |
document types.
|
1434 |
|
1508 |
|
1435 |
* Display category filter as toolbar... this will let you choose if the
|
1509 |
* Display category filter as toolbar... this will let you choose if the
|
|
... |
|
... |
1566 |
substitutions will be performed:
|
1640 |
substitutions will be performed:
|
1567 |
|
1641 |
|
1568 |
* %A. Abstract
|
1642 |
* %A. Abstract
|
1569 |
|
1643 |
|
1570 |
* %D. Date
|
1644 |
* %D. Date
|
|
|
1645 |
|
|
|
1646 |
* %E. Precooked Snippets link (will only appear for documents indexed
|
|
|
1647 |
with page numbers)
|
1571 |
|
1648 |
|
1572 |
* %I. Icon image name. This is normally determined from the mime type.
|
1649 |
* %I. Icon image name. This is normally determined from the mime type.
|
1573 |
The associations are defined inside the mimeconf configuration file.
|
1650 |
The associations are defined inside the mimeconf configuration file.
|
1574 |
If a thumbnail for the file is found at the standard Freedesktop
|
1651 |
If a thumbnail for the file is found at the standard Freedesktop
|
1575 |
location, this will be displayed instead.
|
1652 |
location, this will be displayed instead.
|
|
... |
|
... |
1824 |
* ext specifies the file name extension (Ex: ext:html)
|
1901 |
* ext specifies the file name extension (Ex: ext:html)
|
1825 |
|
1902 |
|
1826 |
The field syntax also supports a few field-like, but special, criteria:
|
1903 |
The field syntax also supports a few field-like, but special, criteria:
|
1827 |
|
1904 |
|
1828 |
* dir for filtering the results on file location (Ex:
|
1905 |
* dir for filtering the results on file location (Ex:
|
1829 |
dir:/home/me/somedir). -dir also works to find results out of the
|
1906 |
dir:/home/me/somedir). -dir also works to find results not in the
|
1830 |
specified directory, only after release 1.15.8. A tilde inside the
|
1907 |
specified directory (release >= 1.15.8). A tilde inside the value will
|
1831 |
value will be expanded to the home directory. dir is not a regular
|
1908 |
be expanded to the home directory. Wildcards will not be expanded. You
|
1832 |
field and only one value makes sense in a query (you can't use
|
1909 |
cannot use OR with dir clauses (this restriction may go away in the
|
1833 |
dir:dir1 OR dir:dir2). Relative paths make sense, for example,
|
1910 |
future).
|
1834 |
dir:share/doc would match either /usr/share/doc or
|
1911 |
|
1835 |
/usr/local/share/doc
|
1912 |
Relative paths also make sense, for example, dir:share/doc would match
|
|
|
1913 |
either /usr/share/doc or /usr/local/share/doc
|
|
|
1914 |
|
|
|
1915 |
Several dir clauses can be specified, both positive and negative. For
|
|
|
1916 |
example the following makes sense:
|
|
|
1917 |
|
|
|
1918 |
dir:recoll dir:src -dir:utils -dir:common
|
|
|
1919 |
|
|
|
1920 |
|
|
|
1921 |
This would select results which have both recoll and src in the path
|
|
|
1922 |
(in any order), and which have not either utils or common.
|
|
|
1923 |
|
|
|
1924 |
Another special aspect of dir clauses is that the values in the index
|
|
|
1925 |
are not transcoded to UTF-8, and never lower-cased or unaccented, but
|
|
|
1926 |
stored as binary. This means that you need to enter the values in the
|
|
|
1927 |
exact lower or upper case, and that searches for names with diacritics
|
|
|
1928 |
may sometimes be impossible because of character set conversion
|
|
|
1929 |
issues. Non-ASCII UNIX file paths are an unending source of trouble
|
|
|
1930 |
and are best avoided.
|
|
|
1931 |
|
|
|
1932 |
You need to use double-quotes around the path value if it contains
|
|
|
1933 |
space characters.
|
1836 |
|
1934 |
|
1837 |
* size for filtering the results on file size. Example: size<10000. You
|
1935 |
* size for filtering the results on file size. Example: size<10000. You
|
1838 |
can use <, > or = as operators. You can specify a range like the
|
1936 |
can use <, > or = as operators. You can specify a range like the
|
1839 |
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
1937 |
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
1840 |
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
1938 |
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
|
... |
|
... |
1911 |
the default is 10.
|
2009 |
the default is 10.
|
1912 |
|
2010 |
|
1913 |
* p can be used to turn the default phrase search into a proximity one
|
2011 |
* p can be used to turn the default phrase search into a proximity one
|
1914 |
(unordered). Example:"order any in"p
|
2012 |
(unordered). Example:"order any in"p
|
1915 |
|
2013 |
|
|
|
2014 |
* C will turn on case sensitivity (if the index supports it).
|
|
|
2015 |
|
|
|
2016 |
* D will turn on diacritics sensitivity (if the index supports it).
|
|
|
2017 |
|
1916 |
* A weight can be specified for a query element by specifying a decimal
|
2018 |
* A weight can be specified for a query element by specifying a decimal
|
1917 |
value at the start of the modifiers. Example: "Important"2.5.
|
2019 |
value at the start of the modifiers. Example: "Important"2.5.
|
1918 |
|
2020 |
|
1919 |
----------------------------------------------------------------------
|
2021 |
----------------------------------------------------------------------
|
1920 |
|
2022 |
|
|
|
2023 |
3.5. Search case and diacritics sensitivity
|
|
|
2024 |
|
|
|
2025 |
For Recoll versions 1.18 and later, and when working with a raw index (not
|
|
|
2026 |
the default), searches can be made sensitive to character case and
|
|
|
2027 |
diacritics. How this happens is controlled by configuration variables and
|
|
|
2028 |
what search data is entered.
|
|
|
2029 |
|
|
|
2030 |
The general default is that searches are insensitive to case and
|
|
|
2031 |
diacritics. An entry of resume will match any of Resume, RESUME, resume,
|
|
|
2032 |
Resume etc.
|
|
|
2033 |
|
|
|
2034 |
Two configuration variables can automate switching on sensitivity:
|
|
|
2035 |
|
|
|
2036 |
autodiacsens
|
|
|
2037 |
|
|
|
2038 |
If this is set, search sensitivity to diacritics will be turned on
|
|
|
2039 |
as soon as an accented character exists in a search term. When the
|
|
|
2040 |
variable is set to true, resume will start a
|
|
|
2041 |
diacritics-unsensitive search, but resume will be matched exactly.
|
|
|
2042 |
The default value is false.
|
|
|
2043 |
|
|
|
2044 |
autocasesens
|
|
|
2045 |
|
|
|
2046 |
If this is set, search sensitivity to character case will be
|
|
|
2047 |
turned on as soon as an upper-case character exists in a search
|
|
|
2048 |
term except for the first one. When the variable is set to true,
|
|
|
2049 |
us or Us will start a diacritics-unsensitive search, but US will
|
|
|
2050 |
be matched exactly. The default value is true (contrary to
|
|
|
2051 |
autodiacsens).
|
|
|
2052 |
|
|
|
2053 |
As in the past, capitalizing the first letter of a word will turn off its
|
|
|
2054 |
stem expansion and have no effect on case-sensitivity.
|
|
|
2055 |
|
|
|
2056 |
You can also explicitely activate case and diacritics sensitivity by using
|
|
|
2057 |
modifiers with the query language. C will make the term case-sensitive,
|
|
|
2058 |
and D will make it diacritics-sensitive. Examples:
|
|
|
2059 |
|
|
|
2060 |
"us"C
|
|
|
2061 |
|
|
|
2062 |
|
|
|
2063 |
will search for the term us exactly (Us will not be a match).
|
|
|
2064 |
|
|
|
2065 |
"resume"D
|
|
|
2066 |
|
|
|
2067 |
|
|
|
2068 |
will search for the term resume exactly (resume will not be a match).
|
|
|
2069 |
|
|
|
2070 |
When either case or diacritics sensitivity is activated, stem expansion is
|
|
|
2071 |
turned off. Having both does not make much sense.
|
|
|
2072 |
|
|
|
2073 |
----------------------------------------------------------------------
|
|
|
2074 |
|
1921 |
3.5. Anchored searches and wildcards
|
2075 |
3.6. Anchored searches and wildcards
|
1922 |
|
2076 |
|
1923 |
Some special characters are interpreted by Recoll in search strings to
|
2077 |
Some special characters are interpreted by Recoll in search strings to
|
1924 |
expand or specialize the search. Wildcards expand a root term in
|
2078 |
expand or specialize the search. Wildcards expand a root term in
|
1925 |
controlled ways. Anchor characters can restrict a search to succeed only
|
2079 |
controlled ways. Anchor characters can restrict a search to succeed only
|
1926 |
if the match is found at or near the beginning of the document or one of
|
2080 |
if the match is found at or near the beginning of the document or one of
|
1927 |
its fields.
|
2081 |
its fields.
|
1928 |
|
2082 |
|
1929 |
----------------------------------------------------------------------
|
2083 |
----------------------------------------------------------------------
|
1930 |
|
2084 |
|
1931 |
3.5.1. More about wildcards
|
2085 |
3.6.1. More about wildcards
|
1932 |
|
2086 |
|
1933 |
All words entered in Recoll search fields will be processed for wildcard
|
2087 |
All words entered in Recoll search fields will be processed for wildcard
|
1934 |
expansion before the request is finally executed.
|
2088 |
expansion before the request is finally executed.
|
1935 |
|
2089 |
|
1936 |
The wildcard characters are:
|
2090 |
The wildcard characters are:
|
|
... |
|
... |
1957 |
expansion will produce better results than an ending * (stem expansion
|
2111 |
expansion will produce better results than an ending * (stem expansion
|
1958 |
is turned off when any wildcard character appears in the term).
|
2112 |
is turned off when any wildcard character appears in the term).
|
1959 |
|
2113 |
|
1960 |
----------------------------------------------------------------------
|
2114 |
----------------------------------------------------------------------
|
1961 |
|
2115 |
|
1962 |
3.5.2. Anchored searches
|
2116 |
3.6.2. Anchored searches
|
1963 |
|
2117 |
|
1964 |
Two characters are used to specify that a search hit should occur at the
|
2118 |
Two characters are used to specify that a search hit should occur at the
|
1965 |
beginning or at the end of the text. ^ at the beginning of a term or
|
2119 |
beginning or at the end of the text. ^ at the beginning of a term or
|
1966 |
phrase constrains the search to happen at the start, $ at the end force it
|
2120 |
phrase constrains the search to happen at the start, $ at the end force it
|
1967 |
to happen at the end.
|
2121 |
to happen at the end.
|
|
... |
|
... |
1982 |
example, bla bla my unexpected term at the beginning of the text would be
|
2136 |
example, bla bla my unexpected term at the beginning of the text would be
|
1983 |
a match for "^my term"o5.
|
2137 |
a match for "^my term"o5.
|
1984 |
|
2138 |
|
1985 |
----------------------------------------------------------------------
|
2139 |
----------------------------------------------------------------------
|
1986 |
|
2140 |
|
1987 |
3.6. Desktop integration
|
2141 |
3.7. Desktop integration
|
1988 |
|
2142 |
|
1989 |
Being independant of the desktop type has its drawbacks: Recoll desktop
|
2143 |
Being independant of the desktop type has its drawbacks: Recoll desktop
|
1990 |
integration is minimal. Here follow a few things that may help.
|
2144 |
integration is minimal. Here follow a few things that may help.
|
1991 |
|
2145 |
|
1992 |
----------------------------------------------------------------------
|
2146 |
----------------------------------------------------------------------
|
1993 |
|
2147 |
|
1994 |
3.6.1. Hotkeying recoll
|
2148 |
3.7.1. Hotkeying recoll
|
1995 |
|
2149 |
|
1996 |
It is surprisingly convenient to be able to show or hide the Recoll GUI
|
2150 |
It is surprisingly convenient to be able to show or hide the Recoll GUI
|
1997 |
with a single keystroke. Recoll comes with a small Python script, based on
|
2151 |
with a single keystroke. Recoll comes with a small Python script, based on
|
1998 |
the libwnck window manager interface library, which will allow you to do
|
2152 |
the libwnck window manager interface library, which will allow you to do
|
1999 |
just this. The detailed instructions are on this wiki page.
|
2153 |
just this. The detailed instructions are on this wiki page.
|
2000 |
|
2154 |
|
2001 |
----------------------------------------------------------------------
|
2155 |
----------------------------------------------------------------------
|
2002 |
|
2156 |
|
2003 |
3.6.2. The KDE Kicker Recoll applet
|
2157 |
3.7.2. The KDE Kicker Recoll applet
|
2004 |
|
2158 |
|
2005 |
The Recoll source tree contains the source code to the recoll_applet, a
|
2159 |
The Recoll source tree contains the source code to the recoll_applet, a
|
2006 |
small application derived from the find_applet. This can be used to add a
|
2160 |
small application derived from the find_applet. This can be used to add a
|
2007 |
small Recoll launcher to the KDE panel.
|
2161 |
small Recoll launcher to the KDE panel.
|
2008 |
|
2162 |
|
|
... |
|
... |
2021 |
a new recoll GUI instance every time (even if it is already running). You
|
2175 |
a new recoll GUI instance every time (even if it is already running). You
|
2022 |
may find it useful anyway.
|
2176 |
may find it useful anyway.
|
2023 |
|
2177 |
|
2024 |
----------------------------------------------------------------------
|
2178 |
----------------------------------------------------------------------
|
2025 |
|
2179 |
|
2026 |
3.7. Multiple databases
|
2180 |
3.8. Multiple databases
|
2027 |
|
2181 |
|
2028 |
Multiple Recoll databases or indexes can be created by using several
|
2182 |
Multiple Recoll databases or indexes can be created by using several
|
2029 |
configuration directories which are usually set to index different areas
|
2183 |
configuration directories which are usually set to index different areas
|
2030 |
of the file system. A specific index can be selected for updating or
|
2184 |
of the file system. A specific index can be selected for updating or
|
2031 |
searching, using the RECOLL_CONFDIR environment variable or the -c option
|
2185 |
searching, using the RECOLL_CONFDIR environment variable or the -c option
|
|
... |
|
... |
2211 |
|
2365 |
|
2212 |
<meta name="somefield" content="Some textual data" />
|
2366 |
<meta name="somefield" content="Some textual data" />
|
2213 |
|
2367 |
|
2214 |
See the following section for details about configuring how field data is
|
2368 |
See the following section for details about configuring how field data is
|
2215 |
processed by the indexer.
|
2369 |
processed by the indexer.
|
|
|
2370 |
|
|
|
2371 |
----------------------------------------------------------------------
|
|
|
2372 |
|
|
|
2373 |
4.1.4. Page numbers
|
|
|
2374 |
|
|
|
2375 |
The indexer will interpret ^L characters in the filter output as
|
|
|
2376 |
indicating page breaks, and will record them. At query time, this allows
|
|
|
2377 |
starting a viewer on the right page for a hit or a snippet. Currently,
|
|
|
2378 |
only the PDF, Postscript and DVI filters generate page breaks.
|
2216 |
|
2379 |
|
2217 |
----------------------------------------------------------------------
|
2380 |
----------------------------------------------------------------------
|
2218 |
|
2381 |
|
2219 |
4.2. Field data processing
|
2382 |
4.2. Field data processing
|
2220 |
|
2383 |
|
|
... |
|
... |
2822 |
|
2985 |
|
2823 |
Recoll indexing options are set inside text configuration files located in
|
2986 |
Recoll indexing options are set inside text configuration files located in
|
2824 |
a configuration directory. There can be several such directories, each of
|
2987 |
a configuration directory. There can be several such directories, each of
|
2825 |
which define the parameters for one index.
|
2988 |
which define the parameters for one index.
|
2826 |
|
2989 |
|
2827 |
The configuration files can be edited by hand or through the Indexing
|
2990 |
The configuration files can be edited by hand or through the Index
|
2828 |
configuration dialog (Preferences menu). The GUI tool will try to respect
|
2991 |
configuration dialog (Preferences menu). The GUI tool will try to respect
|
2829 |
your formatting and comments as much as possible, so it is quite possible
|
2992 |
your formatting and comments as much as possible, so it is quite possible
|
2830 |
to use both ways.
|
2993 |
to use both ways.
|
2831 |
|
2994 |
|
2832 |
The most accurate documentation for the configuration parameters is given
|
2995 |
The most accurate documentation for the configuration parameters is given
|
|
... |
|
... |
3019 |
want to index very big text files as it will both reduce memory
|
3182 |
want to index very big text files as it will both reduce memory
|
3020 |
usage at index time and help with loading data to the preview
|
3183 |
usage at index time and help with loading data to the preview
|
3021 |
window. A size of a few megabytes would seem reasonable (default:
|
3184 |
window. A size of a few megabytes would seem reasonable (default:
|
3022 |
1MB).
|
3185 |
1MB).
|
3023 |
|
3186 |
|
|
|
3187 |
membermaxkbs
|
|
|
3188 |
|
|
|
3189 |
This defines the maximum size in kilobytes for an archive member
|
|
|
3190 |
(zip, tar or rar at the moment). Bigger entries will be skipped.
|
|
|
3191 |
|
3024 |
indexallfilenames
|
3192 |
indexallfilenames
|
3025 |
|
3193 |
|
3026 |
Recoll indexes file names in a special section of the database to
|
3194 |
Recoll indexes file names in a special section of the database to
|
3027 |
allow specific file names searches using wild cards. This
|
3195 |
allow specific file names searches using wild cards. This
|
3028 |
parameter decides if file name indexing is performed only for
|
3196 |
parameter decides if file name indexing is performed only for
|
|
... |
|
... |
3056 |
|
3224 |
|
3057 |
Changing some of these parameters will imply a full reindex. Also, when
|
3225 |
Changing some of these parameters will imply a full reindex. Also, when
|
3058 |
using multiple indexes, it may not make sense to search indexes that don't
|
3226 |
using multiple indexes, it may not make sense to search indexes that don't
|
3059 |
share the values for these parameters, because they usually affect both
|
3227 |
share the values for these parameters, because they usually affect both
|
3060 |
search and index operations.
|
3228 |
search and index operations.
|
|
|
3229 |
|
|
|
3230 |
indexStripChars
|
|
|
3231 |
|
|
|
3232 |
Decide if we strip characters of diacritics and convert them to
|
|
|
3233 |
lower-case before terms are indexed. If we don't, searches
|
|
|
3234 |
sensitive to case and diacritics can be performed, but the index
|
|
|
3235 |
will be bigger, and some marginal weirdness may sometimes occur.
|
|
|
3236 |
The default is a stripped index (indexStripChars = 1) for now.
|
|
|
3237 |
When using multiple indexes for a search, this parameter must be
|
|
|
3238 |
defined identically for all. Changing the value implies an index
|
|
|
3239 |
reset.
|
|
|
3240 |
|
|
|
3241 |
maxTermExpand
|
|
|
3242 |
|
|
|
3243 |
Maximum expansion count for a single term (e.g.: when using
|
|
|
3244 |
wildcards). The default of 10000 is reasonable and will avoid
|
|
|
3245 |
queries that appear frozen while the engine is walking the term
|
|
|
3246 |
list.
|
|
|
3247 |
|
|
|
3248 |
maxXapianClauses
|
|
|
3249 |
|
|
|
3250 |
Maximum number of elementary clauses we can add to a single Xapian
|
|
|
3251 |
query. In some cases, the result of term expansion can be
|
|
|
3252 |
multiplicative, and we want to avoid using excessive memory. The
|
|
|
3253 |
default of 100 000 should be both high enough in most cases and
|
|
|
3254 |
compatible with current typical hardware configurations.
|
3061 |
|
3255 |
|
3062 |
nonumbers
|
3256 |
nonumbers
|
3063 |
|
3257 |
|
3064 |
If this set to true, no terms will be generated for numbers. For
|
3258 |
If this set to true, no terms will be generated for numbers. For
|
3065 |
example "123", "1.5e6", 192.168.1.4, would not be indexed
|
3259 |
example "123", "1.5e6", 192.168.1.4, would not be indexed
|
|
... |
|
... |
3198 |
|
3392 |
|
3199 |
----------------------------------------------------------------------
|
3393 |
----------------------------------------------------------------------
|
3200 |
|
3394 |
|
3201 |
5.4.1.4. Miscellaneous parameters:
|
3395 |
5.4.1.4. Miscellaneous parameters:
|
3202 |
|
3396 |
|
|
|
3397 |
autodiacsens
|
|
|
3398 |
|
|
|
3399 |
IF the index is not stripped, decide if we automatically trigger
|
|
|
3400 |
diacritics sensitivity if the search term has accented characters
|
|
|
3401 |
(not in unac_except_trans). Else you need to use the query
|
|
|
3402 |
language and the D modifier to specify diacritics sensitivity.
|
|
|
3403 |
Default is no.
|
|
|
3404 |
|
|
|
3405 |
autocasesens
|
|
|
3406 |
|
|
|
3407 |
IF the index is not stripped, decide if we automatically trigger
|
|
|
3408 |
character case sensitivity if the search term has upper-case
|
|
|
3409 |
characters in any but the first position. Else you need to use the
|
|
|
3410 |
query language and the C modifier to specify character-case
|
|
|
3411 |
sensitivity. Default is yes.
|
|
|
3412 |
|
3203 |
loglevel,daemloglevel
|
3413 |
loglevel,daemloglevel
|
3204 |
|
3414 |
|
3205 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
3415 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
3206 |
quite a lot of debug/information messages. 2 only lists errors.
|
3416 |
quite a lot of debug/information messages. 2 only lists errors.
|
3207 |
The daemversion is specific to the indexing monitor daemon.
|
3417 |
The daemversion is specific to the indexing monitor daemon.
|
|
... |
|
... |
3235 |
monauxinterval
|
3445 |
monauxinterval
|
3236 |
|
3446 |
|
3237 |
Period (in seconds) at which the real time monitor will regenerate
|
3447 |
Period (in seconds) at which the real time monitor will regenerate
|
3238 |
the auxiliary databases (spelling, stemming) if needed. The
|
3448 |
the auxiliary databases (spelling, stemming) if needed. The
|
3239 |
default is one hour.
|
3449 |
default is one hour.
|
|
|
3450 |
|
|
|
3451 |
monioniceclass, monioniceclassdata
|
|
|
3452 |
|
|
|
3453 |
These allow defining the ionice class and data used by the indexer
|
|
|
3454 |
(default class 3, no data).
|
3240 |
|
3455 |
|
3241 |
filtermaxseconds
|
3456 |
filtermaxseconds
|
3242 |
|
3457 |
|
3243 |
Maximum filter execution time, after which it is aborted. Some
|
3458 |
Maximum filter execution time, after which it is aborted. Some
|
3244 |
postscript programs just loop...
|
3459 |
postscript programs just loop...
|
|
... |
|
... |
3280 |
|
3495 |
|
3281 |
If this is set, the aspell dictionary generation is turned off.
|
3496 |
If this is set, the aspell dictionary generation is turned off.
|
3282 |
Useful for cases where you don't need the functionality or when it
|
3497 |
Useful for cases where you don't need the functionality or when it
|
3283 |
is unusable because aspell crashes during dictionary generation.
|
3498 |
is unusable because aspell crashes during dictionary generation.
|
3284 |
|
3499 |
|
|
|
3500 |
mhmboxquirks
|
|
|
3501 |
|
|
|
3502 |
This allows definining location-related quirks for the mailbox
|
|
|
3503 |
handler. Currently only the tbird flag is defined, and it should
|
|
|
3504 |
be set for directories which hold Thunderbird data, as their
|
|
|
3505 |
folder format is weird.
|
|
|
3506 |
|
3285 |
----------------------------------------------------------------------
|
3507 |
----------------------------------------------------------------------
|
3286 |
|
3508 |
|
3287 |
5.4.2. The fields file
|
3509 |
5.4.2. The fields file
|
3288 |
|
3510 |
|
3289 |
This file contains information about dynamic fields handling in Recoll.
|
3511 |
This file contains information about dynamic fields handling in Recoll.
|
|
... |
|
... |
3392 |
link in a result list. Ie: HTML is normally displayed using firefox, but
|
3614 |
link in a result list. Ie: HTML is normally displayed using firefox, but
|
3393 |
you may prefer Konqueror, your openoffice.org program might be named
|
3615 |
you may prefer Konqueror, your openoffice.org program might be named
|
3394 |
oofice instead of openoffice etc.
|
3616 |
oofice instead of openoffice etc.
|
3395 |
|
3617 |
|
3396 |
Changes to this file can be done by direct editing, or through the recoll
|
3618 |
Changes to this file can be done by direct editing, or through the recoll
|
3397 |
user preferences dialog.
|
3619 |
GUI preferences dialog.
|
3398 |
|
3620 |
|
3399 |
If Use desktop preferences to choose document editor is checked in the
|
3621 |
If Use desktop preferences to choose document editor is checked in the
|
3400 |
Recoll GUI user preferences, all mimeview entries will be ignored except
|
3622 |
Recoll GUI preferences, all mimeview entries will be ignored except the
|
3401 |
the one labelled application/x-all (which is set to use xdg-open by
|
3623 |
one labelled application/x-all (which is set to use xdg-open by default).
|
3402 |
default).
|
3624 |
|
|
|
3625 |
In this case, the xallexcepts top level variable defines a list of mime
|
|
|
3626 |
type exceptions which will be processed according to the local entries
|
|
|
3627 |
instead of being passed to the desktop. This is so that specific Recoll
|
|
|
3628 |
options such as a page number or a search string can be passed to
|
|
|
3629 |
applications that support them, such as the evince viewer.
|
3403 |
|
3630 |
|
3404 |
As for the other configuration files, the normal usage is to have a
|
3631 |
As for the other configuration files, the normal usage is to have a
|
3405 |
mimeview inside your own configuration directory, with just the
|
3632 |
mimeview inside your own configuration directory, with just the
|
3406 |
non-default entries, which will override those from the central
|
3633 |
non-default entries, which will override those from the central
|
3407 |
configuration file.
|
3634 |
configuration file.
|
3408 |
|
3635 |
|
3409 |
Please note that these entries must be placed under a [view] section.
|
3636 |
All viewer definition entries must be placed under a [view] section.
|
3410 |
|
3637 |
|
3411 |
The keys in the file are normally mime types. You can add an application
|
3638 |
The keys in the file are normally mime types. You can add an application
|
3412 |
tag to specialize the choice for an area of the filesystem (using a
|
3639 |
tag to specialize the choice for an area of the filesystem (using a
|
3413 |
localfields specification in mimeconf). The syntax for the key is
|
3640 |
localfields specification in mimeconf). The syntax for the key is
|
3414 |
mimetype|tag
|
3641 |
mimetype|tag
|
|
... |
|
... |
3433 |
on the container type. If this appears in the command line, Recoll
|
3660 |
on the container type. If this appears in the command line, Recoll
|
3434 |
will not create a temporary file to extract the subdocument, expecting
|
3661 |
will not create a temporary file to extract the subdocument, expecting
|
3435 |
the called application (possibly a script) to be able to handle it.
|
3662 |
the called application (possibly a script) to be able to handle it.
|
3436 |
|
3663 |
|
3437 |
* %M. Mime type
|
3664 |
* %M. Mime type
|
|
|
3665 |
|
|
|
3666 |
* %p. Page index. Only significant for a subset of document types,
|
|
|
3667 |
currently only PDF, Postscript and DVI files. Can be used to start the
|
|
|
3668 |
editor at the right page for a match or snippet.
|
|
|
3669 |
|
|
|
3670 |
* %s. Search term. The value will only be set for documents with indexed
|
|
|
3671 |
page numbers (ie: PDF). The value will be one of the matched search
|
|
|
3672 |
terms. It would allow pre-setting the value in the "Find" entry inside
|
|
|
3673 |
Evince for example, for easy highlighting of the term.
|
3438 |
|
3674 |
|
3439 |
* %U, %u. Url.
|
3675 |
* %U, %u. Url.
|
3440 |
|
3676 |
|
3441 |
In addition to the predefined values above, all strings like %(fieldname)
|
3677 |
In addition to the predefined values above, all strings like %(fieldname)
|
3442 |
will be replaced by the value of the field named fieldname for the
|
3678 |
will be replaced by the value of the field named fieldname for the
|