|
a/src/README |
|
b/src/README |
|
... |
|
... |
6 |
|
6 |
|
7 |
Jean-Francois Dockes
|
7 |
Jean-Francois Dockes
|
8 |
|
8 |
|
9 |
<jfd@recoll.org>
|
9 |
<jfd@recoll.org>
|
10 |
|
10 |
|
11 |
Copyright (c) 2005-2013 Jean-Francois Dockes
|
11 |
Copyright (c) 2005-2014 Jean-Francois Dockes
|
12 |
|
12 |
|
13 |
Permission is granted to copy, distribute and/or modify this document
|
13 |
Permission is granted to copy, distribute and/or modify this document
|
14 |
under the terms of the GNU Free Documentation License, Version 1.3 or any
|
14 |
under the terms of the GNU Free Documentation License, Version 1.3 or any
|
15 |
later version published by the Free Software Foundation; with no Invariant
|
15 |
later version published by the Free Software Foundation; with no Invariant
|
16 |
Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the
|
16 |
Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the
|
17 |
license can be found at the following location: GNU web site.
|
17 |
license can be found at the following location: GNU web site.
|
18 |
|
18 |
|
19 |
This document introduces full text search notions and describes the
|
19 |
This document introduces full text search notions and describes the
|
20 |
installation and use of the Recoll application. It currently describes
|
20 |
installation and use of the Recoll application. It currently describes
|
21 |
Recoll 1.19.
|
21 |
Recoll 1.20.
|
22 |
|
22 |
|
23 |
----------------------------------------------------------------------
|
23 |
----------------------------------------------------------------------
|
24 |
|
24 |
|
25 |
Table of Contents
|
25 |
Table of Contents
|
26 |
|
26 |
|
|
... |
|
... |
186 |
|
186 |
|
187 |
5.4.6. The ptrans file
|
187 |
5.4.6. The ptrans file
|
188 |
|
188 |
|
189 |
5.4.7. Examples of configuration adjustments
|
189 |
5.4.7. Examples of configuration adjustments
|
190 |
|
190 |
|
191 |
Chapter 1. Introduction
|
191 |
Chapter 1. Introduction
|
192 |
|
192 |
|
193 |
1.1. Giving it a try
|
193 |
1.1. Giving it a try
|
194 |
|
194 |
|
195 |
If you do not like reading manuals (who does?) and would like to give
|
195 |
If you do not like reading manuals (who does?) and would like to give
|
196 |
Recoll a try, just install the application and start the recoll graphical
|
196 |
Recoll a try, just install the application and start the recoll graphical
|
|
... |
|
... |
319 |
options to help you find what you are looking for. However, there are
|
319 |
options to help you find what you are looking for. However, there are
|
320 |
other ways to perform Recoll searches: mostly a command line interface, a
|
320 |
other ways to perform Recoll searches: mostly a command line interface, a
|
321 |
Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
|
321 |
Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
|
322 |
Lens module.
|
322 |
Lens module.
|
323 |
|
323 |
|
324 |
Chapter 2. Indexing
|
324 |
Chapter 2. Indexing
|
325 |
|
325 |
|
326 |
2.1. Introduction
|
326 |
2.1. Introduction
|
327 |
|
327 |
|
328 |
Indexing is the process by which the set of documents is analyzed and the
|
328 |
Indexing is the process by which the set of documents is analyzed and the
|
329 |
data entered into the database. Recoll indexing is normally incremental:
|
329 |
data entered into the database. Recoll indexing is normally incremental:
|
|
... |
|
... |
337 |
|
337 |
|
338 |
2.1.1. Indexing modes
|
338 |
2.1.1. Indexing modes
|
339 |
|
339 |
|
340 |
Recoll indexing can be performed along two different modes:
|
340 |
Recoll indexing can be performed along two different modes:
|
341 |
|
341 |
|
342 |
o Periodic (or batch) indexing: indexing takes place at discrete times,
|
342 |
* Periodic (or batch) indexing: indexing takes place at discrete times,
|
343 |
by executing the recollindex command. The typical usage is to have a
|
343 |
by executing the recollindex command. The typical usage is to have a
|
344 |
nightly indexing run programmed into your cron file.
|
344 |
nightly indexing run programmed into your cron file.
|
345 |
|
345 |
|
346 |
o Real time indexing: indexing takes place as soon as a file is created
|
346 |
* Real time indexing: indexing takes place as soon as a file is created
|
347 |
or changed. recollindex runs as a daemon and uses a file system
|
347 |
or changed. recollindex runs as a daemon and uses a file system
|
348 |
alteration monitor such as inotify, Fam or Gamin to detect file
|
348 |
alteration monitor such as inotify, Fam or Gamin to detect file
|
349 |
changes.
|
349 |
changes.
|
350 |
|
350 |
|
351 |
The choice between the two methods is mostly a matter of preference, and
|
351 |
The choice between the two methods is mostly a matter of preference, and
|
|
... |
|
... |
455 |
|
455 |
|
456 |
The default location for the index data is the xapiandb subdirectory of
|
456 |
The default location for the index data is the xapiandb subdirectory of
|
457 |
the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
|
457 |
the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
|
458 |
This can be changed via two different methods (with different purposes):
|
458 |
This can be changed via two different methods (with different purposes):
|
459 |
|
459 |
|
460 |
o You can specify a different configuration directory by setting the
|
460 |
* You can specify a different configuration directory by setting the
|
461 |
RECOLL_CONFDIR environment variable, or using the -c option to the
|
461 |
RECOLL_CONFDIR environment variable, or using the -c option to the
|
462 |
Recoll commands. This method would typically be used to index
|
462 |
Recoll commands. This method would typically be used to index
|
463 |
different areas of the file system to different indexes. For example,
|
463 |
different areas of the file system to different indexes. For example,
|
464 |
if you were to issue the following commands:
|
464 |
if you were to issue the following commands:
|
465 |
|
465 |
|
|
... |
|
... |
473 |
|
473 |
|
474 |
Using multiple configuration directories and configuration options
|
474 |
Using multiple configuration directories and configuration options
|
475 |
allows you to tailor multiple configurations and indexes to handle
|
475 |
allows you to tailor multiple configurations and indexes to handle
|
476 |
whatever subset of the available data you wish to make searchable.
|
476 |
whatever subset of the available data you wish to make searchable.
|
477 |
|
477 |
|
478 |
o For a given configuration directory, you can specify a non-default
|
478 |
* For a given configuration directory, you can specify a non-default
|
479 |
storage location for the index by setting the dbdir parameter in the
|
479 |
storage location for the index by setting the dbdir parameter in the
|
480 |
configuration file (see the configuration section). This method would
|
480 |
configuration file (see the configuration section). This method would
|
481 |
mainly be of use if you wanted to keep the configuration directory in
|
481 |
mainly be of use if you wanted to keep the configuration directory in
|
482 |
its default location, but desired another location for the index,
|
482 |
its default location, but desired another location for the index,
|
483 |
typically out of disk occupation concerns.
|
483 |
typically out of disk occupation concerns.
|
|
... |
|
... |
896 |
|
896 |
|
897 |
Recoll provides a configuration option to specify the minimum time before
|
897 |
Recoll provides a configuration option to specify the minimum time before
|
898 |
which a file, specified by a wildcard pattern, cannot be reindexed. See
|
898 |
which a file, specified by a wildcard pattern, cannot be reindexed. See
|
899 |
the mondelaypatterns parameter in the configuration section.
|
899 |
the mondelaypatterns parameter in the configuration section.
|
900 |
|
900 |
|
901 |
Chapter 3. Searching
|
901 |
Chapter 3. Searching
|
902 |
|
902 |
|
903 |
3.1. Searching with the Qt graphical user interface
|
903 |
3.1. Searching with the Qt graphical user interface
|
904 |
|
904 |
|
905 |
The recoll program provides the main user interface for searching. It is
|
905 |
The recoll program provides the main user interface for searching. It is
|
906 |
based on the Qt library.
|
906 |
based on the Qt library.
|
907 |
|
907 |
|
908 |
recoll has two search modes:
|
908 |
recoll has two search modes:
|
909 |
|
909 |
|
910 |
o Simple search (the default, on the main screen) has a single entry
|
910 |
* Simple search (the default, on the main screen) has a single entry
|
911 |
field where you can enter multiple words.
|
911 |
field where you can enter multiple words.
|
912 |
|
912 |
|
913 |
o Advanced search (a panel accessed through the Tools menu or the
|
913 |
* Advanced search (a panel accessed through the Tools menu or the
|
914 |
toolbox bar icon) has multiple entry fields, which you may use to
|
914 |
toolbox bar icon) has multiple entry fields, which you may use to
|
915 |
build a logical condition, with additional filtering on file type,
|
915 |
build a logical condition, with additional filtering on file type,
|
916 |
location in the file system, modification date, and size.
|
916 |
location in the file system, modification date, and size.
|
917 |
|
917 |
|
918 |
In most cases, you can enter the terms as you think them, even if they
|
918 |
In most cases, you can enter the terms as you think them, even if they
|
|
... |
|
... |
952 |
File name will specifically look for file names. The point of having a
|
952 |
File name will specifically look for file names. The point of having a
|
953 |
separate file name search is that wild card expansion can be performed
|
953 |
separate file name search is that wild card expansion can be performed
|
954 |
more efficiently on a small subset of the index (allowing wild cards on
|
954 |
more efficiently on a small subset of the index (allowing wild cards on
|
955 |
the left of terms without excessive penality). Things to know:
|
955 |
the left of terms without excessive penality). Things to know:
|
956 |
|
956 |
|
957 |
o White space in the entry should match white space in the file name,
|
957 |
* White space in the entry should match white space in the file name,
|
958 |
and is not treated specially.
|
958 |
and is not treated specially.
|
959 |
|
959 |
|
960 |
o The search is insensitive to character case and accents, independantly
|
960 |
* The search is insensitive to character case and accents, independantly
|
961 |
of the type of index.
|
961 |
of the type of index.
|
962 |
|
962 |
|
963 |
o An entry without any wild card character and not capitalized will be
|
963 |
* An entry without any wild card character and not capitalized will be
|
964 |
prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
|
964 |
prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
|
965 |
|
965 |
|
966 |
o If you have a big index (many files), excessively generic fragments
|
966 |
* If you have a big index (many files), excessively generic fragments
|
967 |
may result in inefficient searches.
|
967 |
may result in inefficient searches.
|
968 |
|
968 |
|
969 |
You can search for exact phrases (adjacent words in a given order) by
|
969 |
You can search for exact phrases (adjacent words in a given order) by
|
970 |
enclosing the input inside double quotes. Ex: "virtual reality".
|
970 |
enclosing the input inside double quotes. Ex: "virtual reality".
|
971 |
|
971 |
|
|
... |
|
... |
1032 |
standard desktop tool.
|
1032 |
standard desktop tool.
|
1033 |
|
1033 |
|
1034 |
You may also change the choice of applications by editing the mimeview
|
1034 |
You may also change the choice of applications by editing the mimeview
|
1035 |
configuration file if you find this more convenient.
|
1035 |
configuration file if you find this more convenient.
|
1036 |
|
1036 |
|
|
|
1037 |
Each result entry also has a right-click menu with an Open With entry.
|
|
|
1038 |
This lets you choose an application from the list of those which
|
|
|
1039 |
registered with the desktop for the document MIME type.
|
|
|
1040 |
|
1037 |
The Preview and Open edit links may not be present for all entries,
|
1041 |
The Preview and Open edit links may not be present for all entries,
|
1038 |
meaning that Recoll has no configured way to preview a given file type
|
1042 |
meaning that Recoll has no configured way to preview a given file type
|
1039 |
(which was indexed by name only), or no configured external editor for the
|
1043 |
(which was indexed by name only), or no configured external editor for the
|
1040 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
1044 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
1041 |
and mimeview configuration files (the latter can be modified with the user
|
1045 |
and mimeview configuration files (the latter can be modified with the user
|
|
... |
|
... |
1069 |
|
1073 |
|
1070 |
Apart from the preview and edit links, you can display a pop-up menu by
|
1074 |
Apart from the preview and edit links, you can display a pop-up menu by
|
1071 |
right-clicking over a paragraph in the result list. This menu has the
|
1075 |
right-clicking over a paragraph in the result list. This menu has the
|
1072 |
following entries:
|
1076 |
following entries:
|
1073 |
|
1077 |
|
1074 |
o Preview
|
1078 |
* Preview
|
1075 |
|
1079 |
|
1076 |
o Open
|
1080 |
* Open
|
1077 |
|
1081 |
|
1078 |
o Copy File Name
|
1082 |
* Copy File Name
|
1079 |
|
1083 |
|
1080 |
o Copy Url
|
1084 |
* Copy Url
|
1081 |
|
1085 |
|
1082 |
o Save to File
|
1086 |
* Save to File
|
1083 |
|
1087 |
|
1084 |
o Find similar
|
1088 |
* Find similar
|
1085 |
|
1089 |
|
1086 |
o Preview Parent document
|
1090 |
* Preview Parent document
|
1087 |
|
1091 |
|
1088 |
o Open Parent document
|
1092 |
* Open Parent document
|
1089 |
|
1093 |
|
1090 |
o Open Snippets Window
|
1094 |
* Open Snippets Window
|
1091 |
|
1095 |
|
1092 |
The Preview and Open entries do the same thing as the corresponding links.
|
1096 |
The Preview and Open entries do the same thing as the corresponding links.
|
1093 |
|
1097 |
|
1094 |
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
1098 |
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
1095 |
for later pasting.
|
1099 |
for later pasting.
|
|
... |
|
... |
1256 |
|
1260 |
|
1257 |
This part of the dialog lets you constructc a query by combining multiple
|
1261 |
This part of the dialog lets you constructc a query by combining multiple
|
1258 |
clauses of different types. Each entry field is configurable for the
|
1262 |
clauses of different types. Each entry field is configurable for the
|
1259 |
following modes:
|
1263 |
following modes:
|
1260 |
|
1264 |
|
1261 |
o All terms.
|
1265 |
* All terms.
|
1262 |
|
1266 |
|
1263 |
o Any term.
|
1267 |
* Any term.
|
1264 |
|
1268 |
|
1265 |
o None of the terms.
|
1269 |
* None of the terms.
|
1266 |
|
1270 |
|
1267 |
o Phrase (exact terms in order within an adjustable window).
|
1271 |
* Phrase (exact terms in order within an adjustable window).
|
1268 |
|
1272 |
|
1269 |
o Proximity (terms in any order within an adjustable window).
|
1273 |
* Proximity (terms in any order within an adjustable window).
|
1270 |
|
1274 |
|
1271 |
o Filename search.
|
1275 |
* Filename search.
|
1272 |
|
1276 |
|
1273 |
Additional entry fields can be created by clicking the Add clause button.
|
1277 |
Additional entry fields can be created by clicking the Add clause button.
|
1274 |
|
1278 |
|
1275 |
When searching, the non-empty clauses will be combined either with an AND
|
1279 |
When searching, the non-empty clauses will be combined either with an AND
|
1276 |
or an OR conjunction, depending on the choice made on the left (All
|
1280 |
or an OR conjunction, depending on the choice made on the left (All
|
|
... |
|
... |
1295 |
3.1.6.2. Avanced search: the "filter" tab
|
1299 |
3.1.6.2. Avanced search: the "filter" tab
|
1296 |
|
1300 |
|
1297 |
This part of the dialog has several sections which allow filtering the
|
1301 |
This part of the dialog has several sections which allow filtering the
|
1298 |
results of a search according to a number of criteria
|
1302 |
results of a search according to a number of criteria
|
1299 |
|
1303 |
|
1300 |
o The first section allows filtering by dates of last modification. You
|
1304 |
* The first section allows filtering by dates of last modification. You
|
1301 |
can specify both a minimum and a maximum date. The initial values are
|
1305 |
can specify both a minimum and a maximum date. The initial values are
|
1302 |
set according to the oldest and newest documents found in the index.
|
1306 |
set according to the oldest and newest documents found in the index.
|
1303 |
|
1307 |
|
1304 |
o The next section allows filtering the results by file size. There are
|
1308 |
* The next section allows filtering the results by file size. There are
|
1305 |
two entries for minimum and maximum size. Enter decimal numbers. You
|
1309 |
two entries for minimum and maximum size. Enter decimal numbers. You
|
1306 |
can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
|
1310 |
can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
|
1307 |
respectively.
|
1311 |
respectively.
|
1308 |
|
1312 |
|
1309 |
o The next section allows filtering the results by their MIME types, or
|
1313 |
* The next section allows filtering the results by their MIME types, or
|
1310 |
MIME categories (ie: media/text/message/etc.).
|
1314 |
MIME categories (ie: media/text/message/etc.).
|
1311 |
|
1315 |
|
1312 |
You can transfer the types between two boxes, to define which will be
|
1316 |
You can transfer the types between two boxes, to define which will be
|
1313 |
included or excluded by the search.
|
1317 |
included or excluded by the search.
|
1314 |
|
1318 |
|
1315 |
The state of the file type selection can be saved as the default (the
|
1319 |
The state of the file type selection can be saved as the default (the
|
1316 |
file type filter will not be activated at program start-up, but the
|
1320 |
file type filter will not be activated at program start-up, but the
|
1317 |
lists will be in the restored state).
|
1321 |
lists will be in the restored state).
|
1318 |
|
1322 |
|
1319 |
o The bottom section allows restricting the search results to a sub-tree
|
1323 |
* The bottom section allows restricting the search results to a sub-tree
|
1320 |
of the indexed area. You can use the Invert checkbox to search for
|
1324 |
of the indexed area. You can use the Invert checkbox to search for
|
1321 |
files not in the sub-tree instead. If you use directory filtering
|
1325 |
files not in the sub-tree instead. If you use directory filtering
|
1322 |
often and on big subsets of the file system, you may think of setting
|
1326 |
often and on big subsets of the file system, you may think of setting
|
1323 |
up multiple indexes instead, as the performance may be better.
|
1327 |
up multiple indexes instead, as the performance may be better.
|
1324 |
|
1328 |
|
|
... |
|
... |
1553 |
which will let you adjust what columns are displayed. You can drag the
|
1557 |
which will let you adjust what columns are displayed. You can drag the
|
1554 |
column headers to adjust their order. You can click them to sort by the
|
1558 |
column headers to adjust their order. You can click them to sort by the
|
1555 |
field displayed in the column. You can also save the result list in CSV
|
1559 |
field displayed in the column. You can also save the result list in CSV
|
1556 |
format.
|
1560 |
format.
|
1557 |
|
1561 |
|
|
|
1562 |
Changing the GUI geometry. It is possible to configure the GUI in wide
|
|
|
1563 |
form factor by dragging the toolbars to one of the sides (their location
|
|
|
1564 |
is remembered between sessions), and moving the category filters to a menu
|
|
|
1565 |
(can be set in the Preferences -> GUI configuration -> User interface
|
|
|
1566 |
panel).
|
|
|
1567 |
|
1558 |
Query explanation. You can get an exact description of what the query
|
1568 |
Query explanation. You can get an exact description of what the query
|
1559 |
looked for, including stem expansion, and Boolean operators used, by
|
1569 |
looked for, including stem expansion, and Boolean operators used, by
|
1560 |
clicking on the result list header.
|
1570 |
clicking on the result list header.
|
1561 |
|
1571 |
|
1562 |
Advanced search history. As of Recoll 1.18, you can display any of the
|
1572 |
Advanced search history. As of Recoll 1.18, you can display any of the
|
|
... |
|
... |
1599 |
the parameters used for searching and returning results, and what indexes
|
1609 |
the parameters used for searching and returning results, and what indexes
|
1600 |
are searched.
|
1610 |
are searched.
|
1601 |
|
1611 |
|
1602 |
User interface parameters:
|
1612 |
User interface parameters:
|
1603 |
|
1613 |
|
1604 |
o Highlight color for query terms: Terms from the user query are
|
1614 |
* Highlight color for query terms: Terms from the user query are
|
1605 |
highlighted in the result list samples and the preview window. The
|
1615 |
highlighted in the result list samples and the preview window. The
|
1606 |
color can be chosen here. Any Qt color string should work (ie red,
|
1616 |
color can be chosen here. Any Qt color string should work (ie red,
|
1607 |
#ff0000). The default is blue.
|
1617 |
#ff0000). The default is blue.
|
1608 |
|
1618 |
|
1609 |
o Style sheet: The name of a Qt style sheet text file which is applied
|
1619 |
* Style sheet: The name of a Qt style sheet text file which is applied
|
1610 |
to the whole Recoll application on startup. The default value is
|
1620 |
to the whole Recoll application on startup. The default value is
|
1611 |
empty, but there is a skeleton style sheet (recoll.qss) inside the
|
1621 |
empty, but there is a skeleton style sheet (recoll.qss) inside the
|
1612 |
/usr/share/recoll/examples directory. Using a style sheet, you can
|
1622 |
/usr/share/recoll/examples directory. Using a style sheet, you can
|
1613 |
change most recoll graphical parameters: colors, fonts, etc. See the
|
1623 |
change most recoll graphical parameters: colors, fonts, etc. See the
|
1614 |
sample file for a few simple examples.
|
1624 |
sample file for a few simple examples.
|
|
... |
|
... |
1619 |
set the foreground to a light color and the background to a dark one
|
1629 |
set the foreground to a light color and the background to a dark one
|
1620 |
in the desktop preferences, but only the background is set inside the
|
1630 |
in the desktop preferences, but only the background is set inside the
|
1621 |
Recoll style sheet, and it is light too, then text will appear
|
1631 |
Recoll style sheet, and it is light too, then text will appear
|
1622 |
light-on-light inside the Recoll GUI.
|
1632 |
light-on-light inside the Recoll GUI.
|
1623 |
|
1633 |
|
1624 |
o Maximum text size highlighted for preview Inserting highlights on
|
1634 |
* Maximum text size highlighted for preview Inserting highlights on
|
1625 |
search term inside the text before inserting it in the preview window
|
1635 |
search term inside the text before inserting it in the preview window
|
1626 |
involves quite a lot of processing, and can be disabled over the given
|
1636 |
involves quite a lot of processing, and can be disabled over the given
|
1627 |
text size to speed up loading.
|
1637 |
text size to speed up loading.
|
1628 |
|
1638 |
|
1629 |
o Prefer HTML to plain text for preview if set, Recoll will display HTML
|
1639 |
* Prefer HTML to plain text for preview if set, Recoll will display HTML
|
1630 |
as such inside the preview window. If this causes problems with the Qt
|
1640 |
as such inside the preview window. If this causes problems with the Qt
|
1631 |
HTML display, you can uncheck it to display the plain text version
|
1641 |
HTML display, you can uncheck it to display the plain text version
|
1632 |
instead.
|
1642 |
instead.
|
1633 |
|
1643 |
|
1634 |
o Plain text to HTML line style: when displaying plain text inside the
|
1644 |
* Plain text to HTML line style: when displaying plain text inside the
|
1635 |
preview window, Recoll tries to preserve some of the original text
|
1645 |
preview window, Recoll tries to preserve some of the original text
|
1636 |
line breaks and indentation. It can either use PRE HTML tags, which
|
1646 |
line breaks and indentation. It can either use PRE HTML tags, which
|
1637 |
will well preserve the indentation but will force horizontal scrolling
|
1647 |
will well preserve the indentation but will force horizontal scrolling
|
1638 |
for long lines, or use BR tags to break at the original line breaks,
|
1648 |
for long lines, or use BR tags to break at the original line breaks,
|
1639 |
which will let the editor introduce other line breaks according to the
|
1649 |
which will let the editor introduce other line breaks according to the
|
1640 |
window width, but will lose some of the original indentation. The
|
1650 |
window width, but will lose some of the original indentation. The
|
1641 |
third option has been available in recent releases and is probably now
|
1651 |
third option has been available in recent releases and is probably now
|
1642 |
the best one: use PRE tags with line wrapping.
|
1652 |
the best one: use PRE tags with line wrapping.
|
1643 |
|
1653 |
|
1644 |
o Use desktop preferences to choose document editor: if this is checked,
|
1654 |
* Use desktop preferences to choose document editor: if this is checked,
|
1645 |
the xdg-open utility will be used to open files when you click the
|
1655 |
the xdg-open utility will be used to open files when you click the
|
1646 |
Open link in the result list, instead of the application defined in
|
1656 |
Open link in the result list, instead of the application defined in
|
1647 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
1657 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
1648 |
an appropriate application.
|
1658 |
an appropriate application.
|
1649 |
|
1659 |
|
1650 |
o Exceptions: when using the desktop preferences for opening documents,
|
1660 |
* Exceptions: when using the desktop preferences for opening documents,
|
1651 |
these are MIME types that will still be opened according to Recoll
|
1661 |
these are MIME types that will still be opened according to Recoll
|
1652 |
preferences. This is useful for passing parameters like page numbers
|
1662 |
preferences. This is useful for passing parameters like page numbers
|
1653 |
or search strings to applications that support them (e.g. evince).
|
1663 |
or search strings to applications that support them (e.g. evince).
|
1654 |
This cannot be done with xdg-open which only supports passing one
|
1664 |
This cannot be done with xdg-open which only supports passing one
|
1655 |
parameter.
|
1665 |
parameter.
|
1656 |
|
1666 |
|
1657 |
o Choose editor applications this will let you choose the command
|
1667 |
* Choose editor applications this will let you choose the command
|
1658 |
started by the Open links inside the result list, for specific
|
1668 |
started by the Open links inside the result list, for specific
|
1659 |
document types.
|
1669 |
document types.
|
1660 |
|
1670 |
|
1661 |
o Display category filter as toolbar... this will let you choose if the
|
1671 |
* Display category filter as toolbar... this will let you choose if the
|
1662 |
document categories are displayed as a list or a set of buttons.
|
1672 |
document categories are displayed as a list or a set of buttons.
|
1663 |
|
1673 |
|
1664 |
o Auto-start simple search on white space entry: if this is checked, a
|
1674 |
* Auto-start simple search on white space entry: if this is checked, a
|
1665 |
search will be executed each time you enter a space in the simple
|
1675 |
search will be executed each time you enter a space in the simple
|
1666 |
search input field. This lets you look at the result list as you enter
|
1676 |
search input field. This lets you look at the result list as you enter
|
1667 |
new terms. This is off by default, you may like it or not...
|
1677 |
new terms. This is off by default, you may like it or not...
|
1668 |
|
1678 |
|
1669 |
o Start with advanced search dialog open : If you use this dialog
|
1679 |
* Start with advanced search dialog open : If you use this dialog
|
1670 |
frequently, checking the entries will get it to open when recoll
|
1680 |
frequently, checking the entries will get it to open when recoll
|
1671 |
starts.
|
1681 |
starts.
|
1672 |
|
1682 |
|
1673 |
o Remember sort activation state if set, Recoll will remember the sort
|
1683 |
* Remember sort activation state if set, Recoll will remember the sort
|
1674 |
tool stat between invocations. It normally starts with sorting
|
1684 |
tool stat between invocations. It normally starts with sorting
|
1675 |
disabled.
|
1685 |
disabled.
|
1676 |
|
1686 |
|
1677 |
Result list parameters:
|
1687 |
Result list parameters:
|
1678 |
|
1688 |
|
1679 |
o Number of results in a result page
|
1689 |
* Number of results in a result page
|
1680 |
|
1690 |
|
1681 |
o Result list font: There is quite a lot of information shown in the
|
1691 |
* Result list font: There is quite a lot of information shown in the
|
1682 |
result list, and you may want to customize the font and/or font size.
|
1692 |
result list, and you may want to customize the font and/or font size.
|
1683 |
The rest of the fonts used by Recoll are determined by your generic Qt
|
1693 |
The rest of the fonts used by Recoll are determined by your generic Qt
|
1684 |
config (try the qtconfig command).
|
1694 |
config (try the qtconfig command).
|
1685 |
|
1695 |
|
1686 |
o Edit result list paragraph format string: allows you to change the
|
1696 |
* Edit result list paragraph format string: allows you to change the
|
1687 |
presentation of each result list entry. See the result list
|
1697 |
presentation of each result list entry. See the result list
|
1688 |
customisation section.
|
1698 |
customisation section.
|
1689 |
|
1699 |
|
1690 |
o Edit result page HTML header insert: allows you to define text
|
1700 |
* Edit result page HTML header insert: allows you to define text
|
1691 |
inserted at the end of the result page HTML header. More detail in the
|
1701 |
inserted at the end of the result page HTML header. More detail in the
|
1692 |
result list customisation section.
|
1702 |
result list customisation section.
|
1693 |
|
1703 |
|
1694 |
o Date format: allows specifying the format used for displaying dates
|
1704 |
* Date format: allows specifying the format used for displaying dates
|
1695 |
inside the result list. This should be specified as an strftime()
|
1705 |
inside the result list. This should be specified as an strftime()
|
1696 |
string (man strftime).
|
1706 |
string (man strftime).
|
1697 |
|
1707 |
|
1698 |
o Abstract snippet separator: for synthetic abstracts built from index
|
1708 |
* Abstract snippet separator: for synthetic abstracts built from index
|
1699 |
data, which are usually made of several snippets from different parts
|
1709 |
data, which are usually made of several snippets from different parts
|
1700 |
of the document, this defines the snippet separator, an ellipsis by
|
1710 |
of the document, this defines the snippet separator, an ellipsis by
|
1701 |
default.
|
1711 |
default.
|
1702 |
|
1712 |
|
1703 |
Search parameters:
|
1713 |
Search parameters:
|
1704 |
|
1714 |
|
1705 |
o Hide duplicate results: decides if result list entries are shown for
|
1715 |
* Hide duplicate results: decides if result list entries are shown for
|
1706 |
identical documents found in different places.
|
1716 |
identical documents found in different places.
|
1707 |
|
1717 |
|
1708 |
o Stemming language: stemming obviously depends on the document's
|
1718 |
* Stemming language: stemming obviously depends on the document's
|
1709 |
language. This listbox will let you chose among the stemming databases
|
1719 |
language. This listbox will let you chose among the stemming databases
|
1710 |
which were built during indexing (this is set in the main
|
1720 |
which were built during indexing (this is set in the main
|
1711 |
configuration file), or later added with recollindex -s (See the
|
1721 |
configuration file), or later added with recollindex -s (See the
|
1712 |
recollindex manual). Stemming languages which are dynamically added
|
1722 |
recollindex manual). Stemming languages which are dynamically added
|
1713 |
will be deleted at the next indexing pass unless they are also added
|
1723 |
will be deleted at the next indexing pass unless they are also added
|
1714 |
in the configuration file.
|
1724 |
in the configuration file.
|
1715 |
|
1725 |
|
1716 |
o Automatically add phrase to simple searches: a phrase will be
|
1726 |
* Automatically add phrase to simple searches: a phrase will be
|
1717 |
automatically built and added to simple searches when looking for Any
|
1727 |
automatically built and added to simple searches when looking for Any
|
1718 |
terms. This will give a relevance boost to the results where the
|
1728 |
terms. This will give a relevance boost to the results where the
|
1719 |
search terms appear as a phrase (consecutive and in order).
|
1729 |
search terms appear as a phrase (consecutive and in order).
|
1720 |
|
1730 |
|
1721 |
o Autophrase term frequency threshold percentage: very frequent terms
|
1731 |
* Autophrase term frequency threshold percentage: very frequent terms
|
1722 |
should not be included in automatic phrase searches for performance
|
1732 |
should not be included in automatic phrase searches for performance
|
1723 |
reasons. The parameter defines the cutoff percentage (percentage of
|
1733 |
reasons. The parameter defines the cutoff percentage (percentage of
|
1724 |
the documents where the term appears).
|
1734 |
the documents where the term appears).
|
1725 |
|
1735 |
|
1726 |
o Replace abstracts from documents: this decides if we should synthesize
|
1736 |
* Replace abstracts from documents: this decides if we should synthesize
|
1727 |
and display an abstract in place of an explicit abstract found within
|
1737 |
and display an abstract in place of an explicit abstract found within
|
1728 |
the document itself.
|
1738 |
the document itself.
|
1729 |
|
1739 |
|
1730 |
o Dynamically build abstracts: this decides if Recoll tries to build
|
1740 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
1731 |
document abstracts (lists of snippets) when displaying the result
|
1741 |
document abstracts (lists of snippets) when displaying the result
|
1732 |
list. Abstracts are constructed by taking context from the document
|
1742 |
list. Abstracts are constructed by taking context from the document
|
1733 |
information, around the search terms.
|
1743 |
information, around the search terms.
|
1734 |
|
1744 |
|
1735 |
o Synthetic abstract size: adjust to taste...
|
1745 |
* Synthetic abstract size: adjust to taste...
|
1736 |
|
1746 |
|
1737 |
o Synthetic abstract context words: how many words should be displayed
|
1747 |
* Synthetic abstract context words: how many words should be displayed
|
1738 |
around each term occurrence.
|
1748 |
around each term occurrence.
|
1739 |
|
1749 |
|
1740 |
o Query language magic file name suffixes: a list of words which
|
1750 |
* Query language magic file name suffixes: a list of words which
|
1741 |
automatically get turned into ext:xxx file name suffix clauses when
|
1751 |
automatically get turned into ext:xxx file name suffix clauses when
|
1742 |
starting a query language query (ie: doc xls xlsx...). This will save
|
1752 |
starting a query language query (ie: doc xls xlsx...). This will save
|
1743 |
some typing for people who use file types a lot when querying.
|
1753 |
some typing for people who use file types a lot when querying.
|
1744 |
|
1754 |
|
1745 |
External indexes: This panel will let you browse for additional indexes
|
1755 |
External indexes: This panel will let you browse for additional indexes
|
|
... |
|
... |
1760 |
3.1.12.1. The result list format
|
1770 |
3.1.12.1. The result list format
|
1761 |
|
1771 |
|
1762 |
The result list presentation can be exhaustively customized by adjusting
|
1772 |
The result list presentation can be exhaustively customized by adjusting
|
1763 |
two elements:
|
1773 |
two elements:
|
1764 |
|
1774 |
|
1765 |
o The paragraph format
|
1775 |
* The paragraph format
|
1766 |
|
1776 |
|
1767 |
o HTML code inside the header section
|
1777 |
* HTML code inside the header section
|
1768 |
|
1778 |
|
1769 |
These can be edited from the Result list tab of the GUI configuration.
|
1779 |
These can be edited from the Result list tab of the GUI configuration.
|
1770 |
|
1780 |
|
1771 |
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
1781 |
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
1772 |
(this may be disabled at build time), and total customisation is possible
|
1782 |
(this may be disabled at build time), and total customisation is possible
|
|
... |
|
... |
1784 |
The paragraph format
|
1794 |
The paragraph format
|
1785 |
|
1795 |
|
1786 |
This is an arbitrary HTML string where the following printf-like %
|
1796 |
This is an arbitrary HTML string where the following printf-like %
|
1787 |
substitutions will be performed:
|
1797 |
substitutions will be performed:
|
1788 |
|
1798 |
|
1789 |
o %A. Abstract
|
1799 |
* %A. Abstract
|
1790 |
|
1800 |
|
1791 |
o %D. Date
|
1801 |
* %D. Date
|
1792 |
|
1802 |
|
1793 |
o %I. Icon image name. This is normally determined from the MIME type.
|
1803 |
* %I. Icon image name. This is normally determined from the MIME type.
|
1794 |
The associations are defined inside the mimeconf configuration file.
|
1804 |
The associations are defined inside the mimeconf configuration file.
|
1795 |
If a thumbnail for the file is found at the standard Freedesktop
|
1805 |
If a thumbnail for the file is found at the standard Freedesktop
|
1796 |
location, this will be displayed instead.
|
1806 |
location, this will be displayed instead.
|
1797 |
|
1807 |
|
1798 |
o %K. Keywords (if any)
|
1808 |
* %K. Keywords (if any)
|
1799 |
|
1809 |
|
1800 |
o %L. Precooked Preview, Edit, and possibly Snippets links
|
1810 |
* %L. Precooked Preview, Edit, and possibly Snippets links
|
1801 |
|
1811 |
|
1802 |
o %M. MIME type
|
1812 |
* %M. MIME type
|
1803 |
|
1813 |
|
1804 |
o %N. result Number inside the result page
|
1814 |
* %N. result Number inside the result page
|
1805 |
|
1815 |
|
|
|
1816 |
* %P. Parent folder Url. In the case of an embedded document, this is
|
|
|
1817 |
the parent folder for the top level container file.
|
|
|
1818 |
|
1806 |
o %R. Relevance percentage
|
1819 |
* %R. Relevance percentage
|
1807 |
|
1820 |
|
1808 |
o %S. Size information
|
1821 |
* %S. Size information
|
1809 |
|
1822 |
|
1810 |
o %T. Title or Filename if not set.
|
1823 |
* %T. Title or Filename if not set.
|
1811 |
|
1824 |
|
1812 |
o %t. Title or Filename if not set.
|
1825 |
* %t. Title or Filename if not set.
|
1813 |
|
1826 |
|
1814 |
o %U. Url
|
1827 |
* %U. Url
|
1815 |
|
1828 |
|
1816 |
The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
|
1829 |
The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
|
1817 |
href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
|
1830 |
href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
|
1818 |
number inside the result page).
|
1831 |
number inside the result page).
|
|
|
1832 |
|
|
|
1833 |
It is also possible to use a "F%N" value as a link target. This will open
|
|
|
1834 |
the document corresponding to the %P parent folder expansion, usually
|
|
|
1835 |
creating a file manager window on the folder where the container file
|
|
|
1836 |
resides. E.g.:
|
|
|
1837 |
|
|
|
1838 |
<a href="F%N">%P</a>
|
1819 |
|
1839 |
|
1820 |
In addition to the predefined values above, all strings like %(fieldname)
|
1840 |
In addition to the predefined values above, all strings like %(fieldname)
|
1821 |
will be replaced by the value of the field named fieldname for this
|
1841 |
will be replaced by the value of the field named fieldname for this
|
1822 |
document. Only stored fields can be accessed in this way, the value of
|
1842 |
document. Only stored fields can be accessed in this way, the value of
|
1823 |
indexed but not stored fields is not known at this point in the search
|
1843 |
indexed but not stored fields is not known at this point in the search
|
|
... |
|
... |
1906 |
3.3. Searching on the command line
|
1926 |
3.3. Searching on the command line
|
1907 |
|
1927 |
|
1908 |
There are several ways to obtain search results as a text stream, without
|
1928 |
There are several ways to obtain search results as a text stream, without
|
1909 |
a graphical interface:
|
1929 |
a graphical interface:
|
1910 |
|
1930 |
|
1911 |
o By passing option -t to the recoll program.
|
1931 |
* By passing option -t to the recoll program.
|
1912 |
|
1932 |
|
1913 |
o By using the recollq program.
|
1933 |
* By using the recollq program.
|
1914 |
|
1934 |
|
1915 |
o By writing a custom Python program, using the Recoll Python API.
|
1935 |
* By writing a custom Python program, using the Recoll Python API.
|
1916 |
|
1936 |
|
1917 |
The first two methods work in the same way and accept/need the same
|
1937 |
The first two methods work in the same way and accept/need the same
|
1918 |
arguments (except for the additional -t to recoll). The query to be
|
1938 |
arguments (except for the additional -t to recoll). The query to be
|
1919 |
executed is specified as command line arguments.
|
1939 |
executed is specified as command line arguments.
|
1920 |
|
1940 |
|
|
... |
|
... |
1976 |
|
1996 |
|
1977 |
In some cases, the document paths stored inside the index do not match the
|
1997 |
In some cases, the document paths stored inside the index do not match the
|
1978 |
actual ones, so that document previews and accesses will fail. This can
|
1998 |
actual ones, so that document previews and accesses will fail. This can
|
1979 |
occur in a number of circumstances:
|
1999 |
occur in a number of circumstances:
|
1980 |
|
2000 |
|
1981 |
o When using multiple indexes it is a relatively common occurrence that
|
2001 |
* When using multiple indexes it is a relatively common occurrence that
|
1982 |
some will actually reside on a remote volume, for exemple mounted via
|
2002 |
some will actually reside on a remote volume, for exemple mounted via
|
1983 |
NFS. In this case, the paths used to access the documents on the local
|
2003 |
NFS. In this case, the paths used to access the documents on the local
|
1984 |
machine are not necessarily the same than the ones used while indexing
|
2004 |
machine are not necessarily the same than the ones used while indexing
|
1985 |
on the remote machine. For example, /home/me may have been used as a
|
2005 |
on the remote machine. For example, /home/me may have been used as a
|
1986 |
topdirs elements while indexing, but the directory might be mounted as
|
2006 |
topdirs elements while indexing, but the directory might be mounted as
|
1987 |
/net/server/home/me on the local machine.
|
2007 |
/net/server/home/me on the local machine.
|
1988 |
|
2008 |
|
1989 |
o The case may also occur with removable disks. It is perfectly possible
|
2009 |
* The case may also occur with removable disks. It is perfectly possible
|
1990 |
to configure an index to live with the documents on the removable
|
2010 |
to configure an index to live with the documents on the removable
|
1991 |
disk, but it may happen that the disk is not mounted at the same place
|
2011 |
disk, but it may happen that the disk is not mounted at the same place
|
1992 |
so that the documents paths from the index are invalid.
|
2012 |
so that the documents paths from the index are invalid.
|
1993 |
|
2013 |
|
1994 |
o As a last exemple, one could imagine that a big directory has been
|
2014 |
* As a last exemple, one could imagine that a big directory has been
|
1995 |
moved, but that it is currently inconvenient to run the indexer.
|
2015 |
moved, but that it is currently inconvenient to run the indexer.
|
1996 |
|
2016 |
|
1997 |
More generally, the path translation facility may be useful whenever the
|
2017 |
More generally, the path translation facility may be useful whenever the
|
1998 |
documents paths seen by the indexer are not the same as the ones which
|
2018 |
documents paths seen by the indexer are not the same as the ones which
|
1999 |
should be used at query time.
|
2019 |
should be used at query time.
|
|
... |
|
... |
2055 |
|
2075 |
|
2056 |
As usual, words inside quotes define a phrase (the order of words is
|
2076 |
As usual, words inside quotes define a phrase (the order of words is
|
2057 |
significant), so that title:"prejudice pride" is not the same as
|
2077 |
significant), so that title:"prejudice pride" is not the same as
|
2058 |
title:prejudice title:pride, and is unlikely to find a result.
|
2078 |
title:prejudice title:pride, and is unlikely to find a result.
|
2059 |
|
2079 |
|
|
|
2080 |
To save you some typing, recent Recoll versions (1.20 and later) interpret
|
|
|
2081 |
a comma-separated list of terms as an AND list inside the field. Use slash
|
|
|
2082 |
characters ('/') for an OR list. No white space is allowed. So
|
|
|
2083 |
|
|
|
2084 |
author:john,lennon
|
|
|
2085 |
|
|
|
2086 |
will search for documents with john and lennon inside the author field (in
|
|
|
2087 |
any order), and
|
|
|
2088 |
|
|
|
2089 |
author:john/ringo
|
|
|
2090 |
|
|
|
2091 |
would search for john or ringo.
|
|
|
2092 |
|
2060 |
Modifiers can be set on a phrase clause, for example to specify a
|
2093 |
Modifiers can be set on a phrase clause, for example to specify a
|
2061 |
proximity search (unordered). See the modifier section.
|
2094 |
proximity search (unordered). See the modifier section.
|
2062 |
|
2095 |
|
2063 |
Recoll currently manages the following default fields:
|
2096 |
Recoll currently manages the following default fields:
|
2064 |
|
2097 |
|
2065 |
o title, subject or caption are synonyms which specify data to be
|
2098 |
* title, subject or caption are synonyms which specify data to be
|
2066 |
searched for in the document title or subject.
|
2099 |
searched for in the document title or subject.
|
2067 |
|
2100 |
|
2068 |
o author or from for searching the documents originators.
|
2101 |
* author or from for searching the documents originators.
|
2069 |
|
2102 |
|
2070 |
o recipient or to for searching the documents recipients.
|
2103 |
* recipient or to for searching the documents recipients.
|
2071 |
|
2104 |
|
2072 |
o keyword for searching the document-specified keywords (few documents
|
2105 |
* keyword for searching the document-specified keywords (few documents
|
2073 |
actually have any).
|
2106 |
actually have any).
|
2074 |
|
2107 |
|
2075 |
o filename for the document's file name.
|
2108 |
* filename for the document's file name. This is not necessarily set for
|
|
|
2109 |
all documents: internal documents contained inside a compound one (for
|
|
|
2110 |
example an EPUB section) do not inherit the container file name any
|
|
|
2111 |
more, this was replaced by an explicit field (see next). Sub-documents
|
|
|
2112 |
can still have a specific filename, if it is implied by the document
|
|
|
2113 |
format, for example the attachment file name for an email attachment.
|
2076 |
|
2114 |
|
|
|
2115 |
* containerfilename. This is set for all documents, both top-level and
|
|
|
2116 |
contained sub-documents, and is always the name of the filesystem
|
|
|
2117 |
directory entry which contains the data. The terms from this field can
|
|
|
2118 |
only be matched by an explicit field specification (as opposed to
|
|
|
2119 |
terms from filename which are also indexed as general document
|
|
|
2120 |
content). This avoids getting matches for all the sub-documents when
|
|
|
2121 |
searching for the container file name.
|
|
|
2122 |
|
2077 |
o ext specifies the file name extension (Ex: ext:html)
|
2123 |
* ext specifies the file name extension (Ex: ext:html)
|
|
|
2124 |
|
|
|
2125 |
Recoll 1.20 and later have a way to specify aliases for the field names,
|
|
|
2126 |
which will save typing, for example by aliasing filename to fn or
|
|
|
2127 |
containerfilename to cfn. See the section about the fields file
|
2078 |
|
2128 |
|
2079 |
The field syntax also supports a few field-like, but special, criteria:
|
2129 |
The field syntax also supports a few field-like, but special, criteria:
|
2080 |
|
2130 |
|
2081 |
o dir for filtering the results on file location (Ex:
|
2131 |
* dir for filtering the results on file location (Ex:
|
2082 |
dir:/home/me/somedir). -dir also works to find results not in the
|
2132 |
dir:/home/me/somedir). -dir also works to find results not in the
|
2083 |
specified directory (release >= 1.15.8). Tilde expansion will be
|
2133 |
specified directory (release >= 1.15.8). Tilde expansion will be
|
2084 |
performed as usual (except for a bug in versions 1.19 to 1.19.11p1).
|
2134 |
performed as usual (except for a bug in versions 1.19 to 1.19.11p1).
|
2085 |
Wildcards will be expanded, but please have a look at an important
|
2135 |
Wildcards will be expanded, but please have a look at an important
|
2086 |
limitation of wildcards in path filters.
|
2136 |
limitation of wildcards in path filters.
|
|
... |
|
... |
2108 |
and are best avoided.
|
2158 |
and are best avoided.
|
2109 |
|
2159 |
|
2110 |
You need to use double-quotes around the path value if it contains
|
2160 |
You need to use double-quotes around the path value if it contains
|
2111 |
space characters.
|
2161 |
space characters.
|
2112 |
|
2162 |
|
2113 |
o size for filtering the results on file size. Example: size<10000. You
|
2163 |
* size for filtering the results on file size. Example: size<10000. You
|
2114 |
can use <, > or = as operators. You can specify a range like the
|
2164 |
can use <, > or = as operators. You can specify a range like the
|
2115 |
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
2165 |
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
2116 |
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
2166 |
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
2117 |
than 1000 bytes.
|
2167 |
than 1000 bytes.
|
2118 |
|
2168 |
|
2119 |
o date for searching or filtering on dates. The syntax for the argument
|
2169 |
* date for searching or filtering on dates. The syntax for the argument
|
2120 |
is based on the ISO8601 standard for dates and time intervals. Only
|
2170 |
is based on the ISO8601 standard for dates and time intervals. Only
|
2121 |
dates are supported, no times. The general syntax is 2 elements
|
2171 |
dates are supported, no times. The general syntax is 2 elements
|
2122 |
separated by a / character. Each element can be a date or a period of
|
2172 |
separated by a / character. Each element can be a date or a period of
|
2123 |
time. Periods are specified as PnYnMnD. The n numbers are the
|
2173 |
time. Periods are specified as PnYnMnD. The n numbers are the
|
2124 |
respective numbers of years, months or days, any of which may be
|
2174 |
respective numbers of years, months or days, any of which may be
|
2125 |
missing. Dates are specified as YYYY-MM-DD. The days and months parts
|
2175 |
missing. Dates are specified as YYYY-MM-DD. The days and months parts
|
2126 |
may be missing. If the / is present but an element is missing, the
|
2176 |
may be missing. If the / is present but an element is missing, the
|
2127 |
missing element is interpreted as the lowest or highest date in the
|
2177 |
missing element is interpreted as the lowest or highest date in the
|
2128 |
index. Examples:
|
2178 |
index. Examples:
|
2129 |
|
2179 |
|
2130 |
o 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
|
2180 |
* 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
|
2131 |
|
2181 |
|
2132 |
o 2001-03-01/P1Y2M the same specified with a period.
|
2182 |
* 2001-03-01/P1Y2M the same specified with a period.
|
2133 |
|
2183 |
|
2134 |
o 2001/ from the beginning of 2001 to the latest date in the index.
|
2184 |
* 2001/ from the beginning of 2001 to the latest date in the index.
|
2135 |
|
2185 |
|
2136 |
o 2001 the whole year of 2001
|
2186 |
* 2001 the whole year of 2001
|
2137 |
|
2187 |
|
2138 |
o P2D/ means 2 days ago up to now if there are no documents with
|
2188 |
* P2D/ means 2 days ago up to now if there are no documents with
|
2139 |
dates in the future.
|
2189 |
dates in the future.
|
2140 |
|
2190 |
|
2141 |
o /2003 all documents from 2003 or older.
|
2191 |
* /2003 all documents from 2003 or older.
|
2142 |
|
2192 |
|
2143 |
Periods can also be specified with small letters (ie: p2y).
|
2193 |
Periods can also be specified with small letters (ie: p2y).
|
2144 |
|
2194 |
|
2145 |
o mime or format for specifying the MIME type. This one is quite special
|
2195 |
* mime or format for specifying the MIME type. This one is quite special
|
2146 |
because you can specify several values which will be OR'ed (the normal
|
2196 |
because you can specify several values which will be OR'ed (the normal
|
2147 |
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
2197 |
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
2148 |
Specifying an explicit boolean operator before a mime specification is
|
2198 |
Specifying an explicit boolean operator before a mime specification is
|
2149 |
not supported and will produce strange results. You can filter out
|
2199 |
not supported and will produce strange results. You can filter out
|
2150 |
certain types by using negation (-mime:some/type), and you can use
|
2200 |
certain types by using negation (-mime:some/type), and you can use
|
2151 |
wildcards in the value (mime:text/*). Note that mime is the ONLY field
|
2201 |
wildcards in the value (mime:text/*). Note that mime is the ONLY field
|
2152 |
with an OR default. You do need to use OR with ext terms for example.
|
2202 |
with an OR default. You do need to use OR with ext terms for example.
|
2153 |
|
2203 |
|
2154 |
o type or rclcat for specifying the category (as in
|
2204 |
* type or rclcat for specifying the category (as in
|
2155 |
text/media/presentation/etc.). The classification of MIME types in
|
2205 |
text/media/presentation/etc.). The classification of MIME types in
|
2156 |
categories is defined in the Recoll configuration (mimeconf), and can
|
2206 |
categories is defined in the Recoll configuration (mimeconf), and can
|
2157 |
be modified or extended. The default category names are those which
|
2207 |
be modified or extended. The default category names are those which
|
2158 |
permit filtering results in the main GUI screen. Categories are OR'ed
|
2208 |
permit filtering results in the main GUI screen. Categories are OR'ed
|
2159 |
like MIME types above. This can't be negated with - either.
|
2209 |
like MIME types above. This can't be negated with - either.
|
|
... |
|
... |
2174 |
Some characters are recognized as search modifiers when found immediately
|
2224 |
Some characters are recognized as search modifiers when found immediately
|
2175 |
after the closing double quote of a phrase, as in "some
|
2225 |
after the closing double quote of a phrase, as in "some
|
2176 |
term"modifierchars. The actual "phrase" can be a single term of course.
|
2226 |
term"modifierchars. The actual "phrase" can be a single term of course.
|
2177 |
Supported modifiers:
|
2227 |
Supported modifiers:
|
2178 |
|
2228 |
|
2179 |
o l can be used to turn off stemming (mostly makes sense with p because
|
2229 |
* l can be used to turn off stemming (mostly makes sense with p because
|
2180 |
stemming is off by default for phrases).
|
2230 |
stemming is off by default for phrases).
|
2181 |
|
2231 |
|
2182 |
o o can be used to specify a "slack" for phrase and proximity searches:
|
2232 |
* o can be used to specify a "slack" for phrase and proximity searches:
|
2183 |
the number of additional terms that may be found between the specified
|
2233 |
the number of additional terms that may be found between the specified
|
2184 |
ones. If o is followed by an integer number, this is the slack, else
|
2234 |
ones. If o is followed by an integer number, this is the slack, else
|
2185 |
the default is 10.
|
2235 |
the default is 10.
|
2186 |
|
2236 |
|
2187 |
o p can be used to turn the default phrase search into a proximity one
|
2237 |
* p can be used to turn the default phrase search into a proximity one
|
2188 |
(unordered). Example:"order any in"p
|
2238 |
(unordered). Example:"order any in"p
|
2189 |
|
2239 |
|
2190 |
o C will turn on case sensitivity (if the index supports it).
|
2240 |
* C will turn on case sensitivity (if the index supports it).
|
2191 |
|
2241 |
|
2192 |
o D will turn on diacritics sensitivity (if the index supports it).
|
2242 |
* D will turn on diacritics sensitivity (if the index supports it).
|
2193 |
|
2243 |
|
2194 |
o A weight can be specified for a query element by specifying a decimal
|
2244 |
* A weight can be specified for a query element by specifying a decimal
|
2195 |
value at the start of the modifiers. Example: "Important"2.5.
|
2245 |
value at the start of the modifiers. Example: "Important"2.5.
|
2196 |
|
2246 |
|
2197 |
3.6. Search case and diacritics sensitivity
|
2247 |
3.6. Search case and diacritics sensitivity
|
2198 |
|
2248 |
|
2199 |
For Recoll versions 1.18 and later, and when working with a raw index (not
|
2249 |
For Recoll versions 1.18 and later, and when working with a raw index (not
|
|
... |
|
... |
2257 |
All words entered in Recoll search fields will be processed for wildcard
|
2307 |
All words entered in Recoll search fields will be processed for wildcard
|
2258 |
expansion before the request is finally executed.
|
2308 |
expansion before the request is finally executed.
|
2259 |
|
2309 |
|
2260 |
The wildcard characters are:
|
2310 |
The wildcard characters are:
|
2261 |
|
2311 |
|
2262 |
o * which matches 0 or more characters.
|
2312 |
* * which matches 0 or more characters.
|
2263 |
|
2313 |
|
2264 |
o ? which matches a single character.
|
2314 |
* ? which matches a single character.
|
2265 |
|
2315 |
|
2266 |
o [] which allow defining sets of characters to be matched (ex: [abc]
|
2316 |
* [] which allow defining sets of characters to be matched (ex: [abc]
|
2267 |
matches a single character which may be 'a' or 'b' or 'c', [0-9]
|
2317 |
matches a single character which may be 'a' or 'b' or 'c', [0-9]
|
2268 |
matches any number.
|
2318 |
matches any number.
|
2269 |
|
2319 |
|
2270 |
You should be aware of a few things when using wildcards.
|
2320 |
You should be aware of a few things when using wildcards.
|
2271 |
|
2321 |
|
2272 |
o Using a wildcard character at the beginning of a word can make for a
|
2322 |
* Using a wildcard character at the beginning of a word can make for a
|
2273 |
slow search because Recoll will have to scan the whole index term list
|
2323 |
slow search because Recoll will have to scan the whole index term list
|
2274 |
to find the matches. However, this is much less a problem for field
|
2324 |
to find the matches. However, this is much less a problem for field
|
2275 |
searches, and queries like author:*@domain.com can sometimes be very
|
2325 |
searches, and queries like author:*@domain.com can sometimes be very
|
2276 |
useful.
|
2326 |
useful.
|
2277 |
|
2327 |
|
2278 |
o For Recoll version 18 only, when working with a raw index (preserving
|
2328 |
* For Recoll version 18 only, when working with a raw index (preserving
|
2279 |
character case and diacritics), the literal part of a wildcard
|
2329 |
character case and diacritics), the literal part of a wildcard
|
2280 |
expression will be matched exactly for case and diacritics. This is
|
2330 |
expression will be matched exactly for case and diacritics. This is
|
2281 |
not true any more for versions 19 and later.
|
2331 |
not true any more for versions 19 and later.
|
2282 |
|
2332 |
|
2283 |
o Using a * at the end of a word can produce more matches than you would
|
2333 |
* Using a * at the end of a word can produce more matches than you would
|
2284 |
think, and strange search results. You can use the term explorer tool
|
2334 |
think, and strange search results. You can use the term explorer tool
|
2285 |
to check what completions exist for a given term. You can also see
|
2335 |
to check what completions exist for a given term. You can also see
|
2286 |
exactly what search was performed by clicking on the link at the top
|
2336 |
exactly what search was performed by clicking on the link at the top
|
2287 |
of the result list. In general, for natural language terms, stem
|
2337 |
of the result list. In general, for natural language terms, stem
|
2288 |
expansion will produce better results than an ending * (stem expansion
|
2338 |
expansion will produce better results than an ending * (stem expansion
|
|
... |
|
... |
2335 |
3.8. Desktop integration
|
2385 |
3.8. Desktop integration
|
2336 |
|
2386 |
|
2337 |
Being independant of the desktop type has its drawbacks: Recoll desktop
|
2387 |
Being independant of the desktop type has its drawbacks: Recoll desktop
|
2338 |
integration is minimal. However there are a few tools available:
|
2388 |
integration is minimal. However there are a few tools available:
|
2339 |
|
2389 |
|
2340 |
o The KDE KIO Slave was described in a previous section.
|
2390 |
* The KDE KIO Slave was described in a previous section.
|
2341 |
|
2391 |
|
2342 |
o If you use a recent version of Ubuntu Linux, you may find the Ubuntu
|
2392 |
* If you use a recent version of Ubuntu Linux, you may find the Ubuntu
|
2343 |
Unity Lens module useful.
|
2393 |
Unity Lens module useful.
|
2344 |
|
2394 |
|
2345 |
o There is also an independantly developed Krunner plugin.
|
2395 |
* There is also an independantly developed Krunner plugin.
|
2346 |
|
2396 |
|
2347 |
Here follow a few other things that may help.
|
2397 |
Here follow a few other things that may help.
|
2348 |
|
2398 |
|
2349 |
3.8.1. Hotkeying recoll
|
2399 |
3.8.1. Hotkeying recoll
|
2350 |
|
2400 |
|
|
... |
|
... |
2374 |
query (in query language form), and an icon which can be used to restrict
|
2424 |
query (in query language form), and an icon which can be used to restrict
|
2375 |
the search to certain types of files. It is quite primitive, and launches
|
2425 |
the search to certain types of files. It is quite primitive, and launches
|
2376 |
a new recoll GUI instance every time (even if it is already running). You
|
2426 |
a new recoll GUI instance every time (even if it is already running). You
|
2377 |
may find it useful anyway.
|
2427 |
may find it useful anyway.
|
2378 |
|
2428 |
|
2379 |
Chapter 4. Programming interface
|
2429 |
Chapter 4. Programming interface
|
2380 |
|
2430 |
|
2381 |
Recoll has an Application Programming Interface, usable both for indexing
|
2431 |
Recoll has an Application Programming Interface, usable both for indexing
|
2382 |
and searching, currently accessible from the Python language.
|
2432 |
and searching, currently accessible from the Python language.
|
2383 |
|
2433 |
|
2384 |
Another less radical way to extend the application is to write input
|
2434 |
Another less radical way to extend the application is to write input
|
|
... |
|
... |
2408 |
kind will not be described here.
|
2458 |
kind will not be described here.
|
2409 |
|
2459 |
|
2410 |
There are currently (1.18 and since 1.13) two kinds of external executable
|
2460 |
There are currently (1.18 and since 1.13) two kinds of external executable
|
2411 |
input handlers:
|
2461 |
input handlers:
|
2412 |
|
2462 |
|
2413 |
o Simple exec handlers run once and exit. They can be bare programs like
|
2463 |
* Simple exec handlers run once and exit. They can be bare programs like
|
2414 |
antiword, or scripts using other programs. They are very simple to
|
2464 |
antiword, or scripts using other programs. They are very simple to
|
2415 |
write, because they just need to print the converted document to the
|
2465 |
write, because they just need to print the converted document to the
|
2416 |
standard output. Their output can be plain text or HTML. HTML is
|
2466 |
standard output. Their output can be plain text or HTML. HTML is
|
2417 |
usually preferred because it can store metadata fields and it allows
|
2467 |
usually preferred because it can store metadata fields and it allows
|
2418 |
preserving some of the formatting for the GUI preview.
|
2468 |
preserving some of the formatting for the GUI preview.
|
2419 |
|
2469 |
|
2420 |
o Multiple execm handlers can process multiple files (sparing the
|
2470 |
* Multiple execm handlers can process multiple files (sparing the
|
2421 |
process startup time which can be very significant), or multiple
|
2471 |
process startup time which can be very significant), or multiple
|
2422 |
documents per file (e.g.: for zip or chm files). They communicate with
|
2472 |
documents per file (e.g.: for zip or chm files). They communicate with
|
2423 |
the indexer through a simple protocol, but are nevertheless a bit more
|
2473 |
the indexer through a simple protocol, but are nevertheless a bit more
|
2424 |
complicated than the older kind. Most of new handlers are written in
|
2474 |
complicated than the older kind. Most of new handlers are written in
|
2425 |
Python, using a common module to handle the protocol. There is an
|
2475 |
Python, using a common module to handle the protocol. There is an
|
|
... |
|
... |
2495 |
|
2545 |
|
2496 |
execm handlers sometimes need to make a choice for the nature of the ipath
|
2546 |
execm handlers sometimes need to make a choice for the nature of the ipath
|
2497 |
elements that they use in communication with the indexer. Here are a few
|
2547 |
elements that they use in communication with the indexer. Here are a few
|
2498 |
guidelines:
|
2548 |
guidelines:
|
2499 |
|
2549 |
|
2500 |
o Use ASCII or UTF-8 (if the identifier is an integer print it, for
|
2550 |
* Use ASCII or UTF-8 (if the identifier is an integer print it, for
|
2501 |
example, like printf %d would do).
|
2551 |
example, like printf %d would do).
|
2502 |
|
2552 |
|
2503 |
o If at all possible, the data should make some kind of sense when
|
2553 |
* If at all possible, the data should make some kind of sense when
|
2504 |
printed to a log file to help with debugging.
|
2554 |
printed to a log file to help with debugging.
|
2505 |
|
2555 |
|
2506 |
o Recoll uses a colon (:) as a separator to store a complex path
|
2556 |
* Recoll uses a colon (:) as a separator to store a complex path
|
2507 |
internally (for deeper embedding). Colons inside the ipath elements
|
2557 |
internally (for deeper embedding). Colons inside the ipath elements
|
2508 |
output by a handler will be escaped, but would be a bad choice as a
|
2558 |
output by a handler will be escaped, but would be a bad choice as a
|
2509 |
handler-specific separator (mostly, again, for debugging issues).
|
2559 |
handler-specific separator (mostly, again, for debugging issues).
|
2510 |
|
2560 |
|
2511 |
In any case, the main goal is that it should be easy for the handler to
|
2561 |
In any case, the main goal is that it should be easy for the handler to
|
|
... |
|
... |
2546 |
|
2596 |
|
2547 |
application/x-chm = execm rclchm
|
2597 |
application/x-chm = execm rclchm
|
2548 |
|
2598 |
|
2549 |
The fragment specifies that:
|
2599 |
The fragment specifies that:
|
2550 |
|
2600 |
|
2551 |
o application/msword files are processed by executing the antiword
|
2601 |
* application/msword files are processed by executing the antiword
|
2552 |
program, which outputs text/plain encoded in utf-8.
|
2602 |
program, which outputs text/plain encoded in utf-8.
|
2553 |
|
2603 |
|
2554 |
o application/ogg files are processed by the rclogg script, with default
|
2604 |
* application/ogg files are processed by the rclogg script, with default
|
2555 |
output type (text/html, with encoding specified in the header, or
|
2605 |
output type (text/html, with encoding specified in the header, or
|
2556 |
utf-8 by default).
|
2606 |
utf-8 by default).
|
2557 |
|
2607 |
|
2558 |
o text/rtf is processed by unrtf, which outputs text/html. The
|
2608 |
* text/rtf is processed by unrtf, which outputs text/html. The
|
2559 |
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
2609 |
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
2560 |
and not output by unrtf in the HTML header section.
|
2610 |
and not output by unrtf in the HTML header section.
|
2561 |
|
2611 |
|
2562 |
o application/x-chm is processed by a persistant handler. This is
|
2612 |
* application/x-chm is processed by a persistant handler. This is
|
2563 |
determined by the execm keyword.
|
2613 |
determined by the execm keyword.
|
2564 |
|
2614 |
|
2565 |
4.1.4. Input handler HTML output
|
2615 |
4.1.4. Input handler HTML output
|
2566 |
|
2616 |
|
2567 |
The output HTML could be very minimal like the following example:
|
2617 |
The output HTML could be very minimal like the following example:
|
|
... |
|
... |
2651 |
Recoll defines a number of default fields. Additional ones can be output
|
2701 |
Recoll defines a number of default fields. Additional ones can be output
|
2652 |
by handlers, and described in the fields configuration file.
|
2702 |
by handlers, and described in the fields configuration file.
|
2653 |
|
2703 |
|
2654 |
Fields can be:
|
2704 |
Fields can be:
|
2655 |
|
2705 |
|
2656 |
o indexed, meaning that their terms are separately stored in inverted
|
2706 |
* indexed, meaning that their terms are separately stored in inverted
|
2657 |
lists (with a specific prefix), and that a field-specific search is
|
2707 |
lists (with a specific prefix), and that a field-specific search is
|
2658 |
possible.
|
2708 |
possible.
|
2659 |
|
2709 |
|
2660 |
o stored, meaning that their value is recorded in the index data record
|
2710 |
* stored, meaning that their value is recorded in the index data record
|
2661 |
for the document, and can be returned and displayed with search
|
2711 |
for the document, and can be returned and displayed with search
|
2662 |
results.
|
2712 |
results.
|
2663 |
|
2713 |
|
2664 |
A field can be either or both indexed and stored. This and other aspects
|
2714 |
A field can be either or both indexed and stored. This and other aspects
|
2665 |
of fields handling is defined inside the fields configuration file.
|
2715 |
of fields handling is defined inside the fields configuration file.
|
2666 |
|
2716 |
|
2667 |
The sequence of events for field processing is as follows:
|
2717 |
The sequence of events for field processing is as follows:
|
2668 |
|
2718 |
|
2669 |
o During indexing, recollindex scans all meta fields in HTML documents
|
2719 |
* During indexing, recollindex scans all meta fields in HTML documents
|
2670 |
(most document types are transformed into HTML at some point). It
|
2720 |
(most document types are transformed into HTML at some point). It
|
2671 |
compares the name for each element to the configuration defining what
|
2721 |
compares the name for each element to the configuration defining what
|
2672 |
should be done with fields (the fields file)
|
2722 |
should be done with fields (the fields file)
|
2673 |
|
2723 |
|
2674 |
o If the name for the meta element matches one for a field that should
|
2724 |
* If the name for the meta element matches one for a field that should
|
2675 |
be indexed, the contents are processed and the terms are entered into
|
2725 |
be indexed, the contents are processed and the terms are entered into
|
2676 |
the index with the prefix defined in the fields file.
|
2726 |
the index with the prefix defined in the fields file.
|
2677 |
|
2727 |
|
2678 |
o If the name for the meta element matches one for a field that should
|
2728 |
* If the name for the meta element matches one for a field that should
|
2679 |
be stored, the content of the element is stored with the document data
|
2729 |
be stored, the content of the element is stored with the document data
|
2680 |
record, from which it can be extracted and displayed at query time.
|
2730 |
record, from which it can be extracted and displayed at query time.
|
2681 |
|
2731 |
|
2682 |
o At query time, if a field search is performed, the index prefix is
|
2732 |
* At query time, if a field search is performed, the index prefix is
|
2683 |
computed and the match is only performed against appropriately
|
2733 |
computed and the match is only performed against appropriately
|
2684 |
prefixed terms in the index.
|
2734 |
prefixed terms in the index.
|
2685 |
|
2735 |
|
2686 |
o At query time, the field can be displayed inside the result list by
|
2736 |
* At query time, the field can be displayed inside the result list by
|
2687 |
using the appropriate directive in the definition of the result list
|
2737 |
using the appropriate directive in the definition of the result list
|
2688 |
paragraph format. All fields are displayed on the fields screen of the
|
2738 |
paragraph format. All fields are displayed on the fields screen of the
|
2689 |
preview window (which you can reach through the right-click menu).
|
2739 |
preview window (which you can reach through the right-click menu).
|
2690 |
This is independant of the fact that the search which produced the
|
2740 |
This is independant of the fact that the search which produced the
|
2691 |
results used the field or not.
|
2741 |
results used the field or not.
|
|
... |
|
... |
2747 |
searching one is used in the Recoll Ubuntu Unity Lens and Recoll Web UI.
|
2797 |
searching one is used in the Recoll Ubuntu Unity Lens and Recoll Web UI.
|
2748 |
|
2798 |
|
2749 |
The API is inspired by the Python database API specification. There were
|
2799 |
The API is inspired by the Python database API specification. There were
|
2750 |
two major changes in recent Recoll versions:
|
2800 |
two major changes in recent Recoll versions:
|
2751 |
|
2801 |
|
2752 |
o The basis for the Recoll API changed from Python database API version
|
2802 |
* The basis for the Recoll API changed from Python database API version
|
2753 |
1.0 (Recoll versions up to 1.18.1), to version 2.0 (Recoll 1.18.2 and
|
2803 |
1.0 (Recoll versions up to 1.18.1), to version 2.0 (Recoll 1.18.2 and
|
2754 |
later).
|
2804 |
later).
|
2755 |
o The recoll module became a package (with an internal recoll module) as
|
2805 |
* The recoll module became a package (with an internal recoll module) as
|
2756 |
of Recoll version 1.19, in order to add more functions. For existing
|
2806 |
of Recoll version 1.19, in order to add more functions. For existing
|
2757 |
code, this only changes the way the interface must be imported.
|
2807 |
code, this only changes the way the interface must be imported.
|
2758 |
|
2808 |
|
2759 |
We will mostly describe the new API and package structure here. A
|
2809 |
We will mostly describe the new API and package structure here. A
|
2760 |
paragraph at the end of this section will explain a few differences and
|
2810 |
paragraph at the end of this section will explain a few differences and
|
|
... |
|
... |
2780 |
|
2830 |
|
2781 |
4.3.2.2. Recoll package
|
2831 |
4.3.2.2. Recoll package
|
2782 |
|
2832 |
|
2783 |
The recoll package contains two modules:
|
2833 |
The recoll package contains two modules:
|
2784 |
|
2834 |
|
2785 |
o The recoll module contains functions and classes used to query (or
|
2835 |
* The recoll module contains functions and classes used to query (or
|
2786 |
update) the index.
|
2836 |
update) the index.
|
2787 |
|
2837 |
|
2788 |
o The rclextract module contains functions and classes used to access
|
2838 |
* The rclextract module contains functions and classes used to access
|
2789 |
document data.
|
2839 |
document data.
|
2790 |
|
2840 |
|
2791 |
4.3.2.3. The recoll module
|
2841 |
4.3.2.3. The recoll module
|
2792 |
|
2842 |
|
2793 |
Functions
|
2843 |
Functions
|
2794 |
|
2844 |
|
2795 |
connect(confdir=None, extra_dbs=None, writable = False)
|
2845 |
connect(confdir=None, extra_dbs=None, writable = False)
|
2796 |
The connect() function connects to one or several Recoll index(es)
|
2846 |
The connect() function connects to one or several Recoll index(es)
|
2797 |
and returns a Db object.
|
2847 |
and returns a Db object.
|
2798 |
o confdir may specify a configuration directory. The usual
|
2848 |
* confdir may specify a configuration directory. The usual
|
2799 |
defaults apply.
|
2849 |
defaults apply.
|
2800 |
o extra_dbs is a list of additional indexes (Xapian
|
2850 |
* extra_dbs is a list of additional indexes (Xapian
|
2801 |
directories).
|
2851 |
directories).
|
2802 |
o writable decides if we can index new data through this
|
2852 |
* writable decides if we can index new data through this
|
2803 |
connection.
|
2853 |
connection.
|
2804 |
This call initializes the recoll module, and it should always be
|
2854 |
This call initializes the recoll module, and it should always be
|
2805 |
performed before any other call or object creation.
|
2855 |
performed before any other call or object creation.
|
2806 |
|
2856 |
|
2807 |
Classes
|
2857 |
Classes
|
|
... |
|
... |
3045 |
|
3095 |
|
3046 |
rownum = query.next if type(query.next) == int else \
|
3096 |
rownum = query.next if type(query.next) == int else \
|
3047 |
query.rownumber
|
3097 |
query.rownumber
|
3048 |
|
3098 |
|
3049 |
|
3099 |
|
3050 |
Chapter 5. Installation and configuration
|
3100 |
Chapter 5. Installation and configuration
|
3051 |
|
3101 |
|
3052 |
5.1. Installing a binary copy
|
3102 |
5.1. Installing a binary copy
|
3053 |
|
3103 |
|
3054 |
There are three types of binary Recoll installations:
|
3104 |
There are three types of binary Recoll installations:
|
3055 |
|
3105 |
|
3056 |
o Through your system normal software distribution framework (ie,
|
3106 |
* Through your system normal software distribution framework (ie,
|
3057 |
Debian/Ubuntu apt, FreeBSD ports, etc.).
|
3107 |
Debian/Ubuntu apt, FreeBSD ports, etc.).
|
3058 |
|
3108 |
|
3059 |
o From a package downloaded from the Recoll web site.
|
3109 |
* From a package downloaded from the Recoll web site.
|
3060 |
|
3110 |
|
3061 |
o From a prebuilt tree downloaded from the Recoll web site.
|
3111 |
* From a prebuilt tree downloaded from the Recoll web site.
|
3062 |
|
3112 |
|
3063 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
3113 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
3064 |
will be automatically satisfied, you should not have to worry about them.
|
3114 |
will be automatically satisfied, you should not have to worry about them.
|
3065 |
|
3115 |
|
3066 |
You will only have to check or install supporting applications for the
|
3116 |
You will only have to check or install supporting applications for the
|
|
... |
|
... |
3120 |
by ad hoc handler code now use the xsltproc command, which usually comes
|
3170 |
by ad hoc handler code now use the xsltproc command, which usually comes
|
3121 |
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
3171 |
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
3122 |
|
3172 |
|
3123 |
Now for the list:
|
3173 |
Now for the list:
|
3124 |
|
3174 |
|
3125 |
o Openoffice files need unzip and xsltproc.
|
3175 |
* Openoffice files need unzip and xsltproc.
|
3126 |
|
3176 |
|
3127 |
o PDF files need pdftotext which is part of the Xpdf or Poppler
|
3177 |
* PDF files need pdftotext which is part of the Xpdf or Poppler
|
3128 |
packages.
|
3178 |
packages.
|
3129 |
|
3179 |
|
3130 |
o Postscript files need pstotext. The original version has an issue with
|
3180 |
* Postscript files need pstotext. The original version has an issue with
|
3131 |
shell character in file names, which is corrected in recent packages.
|
3181 |
shell character in file names, which is corrected in recent packages.
|
3132 |
See http://www.recoll.org/features.html for more detail.
|
3182 |
See http://www.recoll.org/features.html for more detail.
|
3133 |
|
3183 |
|
3134 |
o MS Word needs antiword. It is also useful to have wvWare installed as
|
3184 |
* MS Word needs antiword. It is also useful to have wvWare installed as
|
3135 |
it may be be used as a fallback for some files which antiword does not
|
3185 |
it may be be used as a fallback for some files which antiword does not
|
3136 |
handle.
|
3186 |
handle.
|
3137 |
|
3187 |
|
3138 |
o MS Excel and PowerPoint are processed by internal Python handlers.
|
3188 |
* MS Excel and PowerPoint are processed by internal Python handlers.
|
3139 |
|
3189 |
|
3140 |
o MS Open XML (docx) needs xsltproc.
|
3190 |
* MS Open XML (docx) needs xsltproc.
|
3141 |
|
3191 |
|
3142 |
o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
|
3192 |
* Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
|
3143 |
Ubuntu) package.
|
3193 |
Ubuntu) package.
|
3144 |
|
3194 |
|
3145 |
o RTF files need unrtf, which, in its standard version, has much trouble
|
3195 |
* RTF files need unrtf, which, in its standard version, has much trouble
|
3146 |
with non-western character sets. Check
|
3196 |
with non-western character sets. Check
|
3147 |
http://www.recoll.org/features.html.
|
3197 |
http://www.recoll.org/features.html.
|
3148 |
|
3198 |
|
3149 |
o TeX files need untex or detex. Check
|
3199 |
* TeX files need untex or detex. Check
|
3150 |
http://www.recoll.org/features.html for sources if it's not packaged
|
3200 |
http://www.recoll.org/features.html for sources if it's not packaged
|
3151 |
for your distribution.
|
3201 |
for your distribution.
|
3152 |
|
3202 |
|
3153 |
o dvi files need dvips.
|
3203 |
* dvi files need dvips.
|
3154 |
|
3204 |
|
3155 |
o djvu files need djvutxt and djvused from the DjVuLibre package.
|
3205 |
* djvu files need djvutxt and djvused from the DjVuLibre package.
|
3156 |
|
3206 |
|
3157 |
o Audio files: Recoll releases 1.14 and later use a single Python
|
3207 |
* Audio files: Recoll releases 1.14 and later use a single Python
|
3158 |
handler based on mutagen for all audio file types.
|
3208 |
handler based on mutagen for all audio file types.
|
3159 |
|
3209 |
|
3160 |
o Pictures: Recoll uses the Exiftool Perl package to extract tag
|
3210 |
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
3161 |
information. Most image file formats are supported. Note that there
|
3211 |
information. Most image file formats are supported. Note that there
|
3162 |
may not be much interest in indexing the technical tags (image size,
|
3212 |
may not be much interest in indexing the technical tags (image size,
|
3163 |
aperture, etc.). This is only of interest if you store personal tags
|
3213 |
aperture, etc.). This is only of interest if you store personal tags
|
3164 |
or textual descriptions inside the image files.
|
3214 |
or textual descriptions inside the image files.
|
3165 |
|
3215 |
|
3166 |
o chm: files in Microsoft help format need Python and the pychm module
|
3216 |
* chm: files in Microsoft help format need Python and the pychm module
|
3167 |
(which needs chmlib).
|
3217 |
(which needs chmlib).
|
3168 |
|
3218 |
|
3169 |
o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
3219 |
* ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
3170 |
module. icalendar is not needed for newer versions, which use internal
|
3220 |
module. icalendar is not needed for newer versions, which use internal
|
3171 |
code.
|
3221 |
code.
|
3172 |
|
3222 |
|
3173 |
o Zip archives need Python (and the standard zipfile module).
|
3223 |
* Zip archives need Python (and the standard zipfile module).
|
3174 |
|
3224 |
|
3175 |
o Rar archives need Python, the rarfile Python module and the unrar
|
3225 |
* Rar archives need Python, the rarfile Python module and the unrar
|
3176 |
utility.
|
3226 |
utility.
|
3177 |
|
3227 |
|
3178 |
o Midi karaoke files need Python and the Midi module
|
3228 |
* Midi karaoke files need Python and the Midi module
|
3179 |
|
3229 |
|
3180 |
o Konqueror webarchive format with Python (uses the Tarfile module).
|
3230 |
* Konqueror webarchive format with Python (uses the Tarfile module).
|
3181 |
|
3231 |
|
3182 |
o Mimehtml web archive format (support based on the email handler, which
|
3232 |
* Mimehtml web archive format (support based on the email handler, which
|
3183 |
introduces some mild weirdness, but still usable).
|
3233 |
introduces some mild weirdness, but still usable).
|
3184 |
|
3234 |
|
3185 |
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
3235 |
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
3186 |
is used to index Lyx files. Many handlers need iconv and the standard sed
|
3236 |
is used to index Lyx files. Many handlers need iconv and the standard sed
|
3187 |
and awk.
|
3237 |
and awk.
|
|
... |
|
... |
3196 |
|
3246 |
|
3197 |
You may have to compile Xapian but this is easy.
|
3247 |
You may have to compile Xapian but this is easy.
|
3198 |
|
3248 |
|
3199 |
The shopping list:
|
3249 |
The shopping list:
|
3200 |
|
3250 |
|
3201 |
o C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
3251 |
* C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
3202 |
itself by strange messages about a missing iconv_open.
|
3252 |
itself by strange messages about a missing iconv_open.
|
3203 |
|
3253 |
|
3204 |
o Development files for Xapian core.
|
3254 |
* Development files for Xapian core.
|
3205 |
|
3255 |
|
3206 |
Important
|
3256 |
Important
|
3207 |
|
3257 |
|
3208 |
If you are building Xapian for an older CPU (before Pentium 4 or
|
3258 |
If you are building Xapian for an older CPU (before Pentium 4 or
|
3209 |
Athlon 64), you need to add the --disable-sse flag to the configure
|
3259 |
Athlon 64), you need to add the --disable-sse flag to the configure
|
3210 |
command. Else all Xapian application will crash with an illegal
|
3260 |
command. Else all Xapian application will crash with an illegal
|
3211 |
instruction error.
|
3261 |
instruction error.
|
3212 |
|
3262 |
|
3213 |
o Development files for Qt 4 . Recoll has not been tested with Qt 5 yet.
|
3263 |
* Development files for Qt 4 . Recoll has not been tested with Qt 5 yet.
|
3214 |
Recoll 1.15.9 was the last version to support Qt 3. If you do not want
|
3264 |
Recoll 1.15.9 was the last version to support Qt 3. If you do not want
|
3215 |
to install or build the Qt Webkit module, Recoll has a configuration
|
3265 |
to install or build the Qt Webkit module, Recoll has a configuration
|
3216 |
option to disable its use (see further).
|
3266 |
option to disable its use (see further).
|
3217 |
|
3267 |
|
3218 |
o Development files for X11 and zlib.
|
3268 |
* Development files for X11 and zlib.
|
3219 |
|
3269 |
|
3220 |
o You may also need libiconv. On Linux systems, the iconv interface is
|
3270 |
* You may also need libiconv. On Linux systems, the iconv interface is
|
3221 |
part of libc and you should not need to do anything special.
|
3271 |
part of libc and you should not need to do anything special.
|
3222 |
|
3272 |
|
3223 |
Check the Recoll download page for up to date version information.
|
3273 |
Check the Recoll download page for up to date version information.
|
3224 |
|
3274 |
|
3225 |
5.3.2. Building
|
3275 |
5.3.2. Building
|
|
... |
|
... |
3229 |
ok). If you build on another system, and need to modify things, I would
|
3279 |
ok). If you build on another system, and need to modify things, I would
|
3230 |
very much welcome patches.
|
3280 |
very much welcome patches.
|
3231 |
|
3281 |
|
3232 |
Configure options:
|
3282 |
Configure options:
|
3233 |
|
3283 |
|
3234 |
o --without-aspell will disable the code for phonetic matching of search
|
3284 |
* --without-aspell will disable the code for phonetic matching of search
|
3235 |
terms.
|
3285 |
terms.
|
3236 |
|
3286 |
|
3237 |
o --with-fam or --with-inotify will enable the code for real time
|
3287 |
* --with-fam or --with-inotify will enable the code for real time
|
3238 |
indexing. Inotify support is enabled by default on recent Linux
|
3288 |
indexing. Inotify support is enabled by default on recent Linux
|
3239 |
systems.
|
3289 |
systems.
|
3240 |
|
3290 |
|
3241 |
o --with-qzeitgeist will enable sending Zeitgeist events about the
|
3291 |
* --with-qzeitgeist will enable sending Zeitgeist events about the
|
3242 |
visited search results, and needs the qzeitgeist package.
|
3292 |
visited search results, and needs the qzeitgeist package.
|
3243 |
|
3293 |
|
3244 |
o --disable-webkit is available from version 1.17 to implement the
|
3294 |
* --disable-webkit is available from version 1.17 to implement the
|
3245 |
result list with a Qt QTextBrowser instead of a WebKit widget if you
|
3295 |
result list with a Qt QTextBrowser instead of a WebKit widget if you
|
3246 |
do not or can't depend on the latter.
|
3296 |
do not or can't depend on the latter.
|
3247 |
|
3297 |
|
3248 |
o --disable-idxthreads is available from version 1.19 to suppress
|
3298 |
* --disable-idxthreads is available from version 1.19 to suppress
|
3249 |
multithreading inside the indexing process. You can also use the
|
3299 |
multithreading inside the indexing process. You can also use the
|
3250 |
run-time configuration to restrict recollindex to using a single
|
3300 |
run-time configuration to restrict recollindex to using a single
|
3251 |
thread, but the compile-time option may disable a few more unused
|
3301 |
thread, but the compile-time option may disable a few more unused
|
3252 |
locks. This only applies to the use of multithreading for the core
|
3302 |
locks. This only applies to the use of multithreading for the core
|
3253 |
index processing (data input). The Recoll monitor mode always uses at
|
3303 |
index processing (data input). The Recoll monitor mode always uses at
|
3254 |
least two threads of execution.
|
3304 |
least two threads of execution.
|
3255 |
|
3305 |
|
3256 |
o --disable-python-module will avoid building the Python module.
|
3306 |
* --disable-python-module will avoid building the Python module.
|
3257 |
|
3307 |
|
3258 |
o --disable-xattr will prevent fetching data from file extended
|
3308 |
* --disable-xattr will prevent fetching data from file extended
|
3259 |
attributes. Beyond a few standard attributes, fetching extended
|
3309 |
attributes. Beyond a few standard attributes, fetching extended
|
3260 |
attributes data can only be useful is some application stores data in
|
3310 |
attributes data can only be useful is some application stores data in
|
3261 |
there, and also needs some simple configuration (see comments in the
|
3311 |
there, and also needs some simple configuration (see comments in the
|
3262 |
fields configuration file).
|
3312 |
fields configuration file).
|
3263 |
|
3313 |
|
3264 |
o --enable-camelcase will enable splitting camelCase words. This is not
|
3314 |
* --enable-camelcase will enable splitting camelCase words. This is not
|
3265 |
enabled by default as it has the unfortunate side-effect of making
|
3315 |
enabled by default as it has the unfortunate side-effect of making
|
3266 |
some phrase searches quite confusing: ie, "MySQL manual" would be
|
3316 |
some phrase searches quite confusing: ie, "MySQL manual" would be
|
3267 |
matched by "MySQL manual" and "my sql manual" but not "mysql manual"
|
3317 |
matched by "MySQL manual" and "my sql manual" but not "mysql manual"
|
3268 |
(only inside phrase searches).
|
3318 |
(only inside phrase searches).
|
3269 |
|
3319 |
|
3270 |
o --with-file-command Specify the version of the 'file' command to use
|
3320 |
* --with-file-command Specify the version of the 'file' command to use
|
3271 |
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
3321 |
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
3272 |
the gnu version on systems where the native one is bad.
|
3322 |
the gnu version on systems where the native one is bad.
|
3273 |
|
3323 |
|
3274 |
o --disable-qtgui Disable the Qt interface. Will allow building the
|
3324 |
* --disable-qtgui Disable the Qt interface. Will allow building the
|
3275 |
indexer and the command line search program in absence of a Qt
|
3325 |
indexer and the command line search program in absence of a Qt
|
3276 |
environment.
|
3326 |
environment.
|
3277 |
|
3327 |
|
3278 |
o --disable-x11mon Disable X11 connection monitoring inside recollindex.
|
3328 |
* --disable-x11mon Disable X11 connection monitoring inside recollindex.
|
3279 |
Together with --disable-qtgui, this allows building recoll without Qt
|
3329 |
Together with --disable-qtgui, this allows building recoll without Qt
|
3280 |
and X11.
|
3330 |
and X11.
|
3281 |
|
3331 |
|
3282 |
o --disable-pic will compile Recoll with position-dependant code. This
|
3332 |
* --disable-pic will compile Recoll with position-dependant code. This
|
3283 |
is incompatible with building the KIO or the Python or PHP extensions,
|
3333 |
is incompatible with building the KIO or the Python or PHP extensions,
|
3284 |
but might yield very marginally faster code.
|
3334 |
but might yield very marginally faster code.
|
3285 |
|
3335 |
|
3286 |
o Of course the usual autoconf configure options, like --prefix apply.
|
3336 |
* Of course the usual autoconf configure options, like --prefix apply.
|
3287 |
|
3337 |
|
3288 |
Normal procedure:
|
3338 |
Normal procedure:
|
3289 |
|
3339 |
|
3290 |
cd recoll-xxx
|
3340 |
cd recoll-xxx
|
3291 |
configure
|
3341 |
configure
|
|
... |
|
... |
3387 |
defaultcharset = utf-8
|
3437 |
defaultcharset = utf-8
|
3388 |
|
3438 |
|
3389 |
|
3439 |
|
3390 |
There are three kinds of lines:
|
3440 |
There are three kinds of lines:
|
3391 |
|
3441 |
|
3392 |
o Comment (starts with #) or empty.
|
3442 |
* Comment (starts with #) or empty.
|
3393 |
|
3443 |
|
3394 |
o Parameter affectation (name = value).
|
3444 |
* Parameter affectation (name = value).
|
3395 |
|
3445 |
|
3396 |
o Section definition ([somedirname]).
|
3446 |
* Section definition ([somedirname]).
|
3397 |
|
3447 |
|
3398 |
Depending on the type of configuration file, section definitions either
|
3448 |
Depending on the type of configuration file, section definitions either
|
3399 |
separate groups of parameters or allow redefining some parameters for a
|
3449 |
separate groups of parameters or allow redefining some parameters for a
|
3400 |
directory sub-tree. They stay in effect until another section definition,
|
3450 |
directory sub-tree. They stay in effect until another section definition,
|
3401 |
or the end of file, is encountered. Some of the parameters used for
|
3451 |
or the end of file, is encountered. Some of the parameters used for
|
|
... |
|
... |
3410 |
embedded spaces can be quoted using double-quotes.
|
3460 |
embedded spaces can be quoted using double-quotes.
|
3411 |
|
3461 |
|
3412 |
Encoding issues. Most of the configuration parameters are plain ASCII. Two
|
3462 |
Encoding issues. Most of the configuration parameters are plain ASCII. Two
|
3413 |
particular sets of values may cause encoding issues:
|
3463 |
particular sets of values may cause encoding issues:
|
3414 |
|
3464 |
|
3415 |
o File path parameters may contain non-ascii characters and should use
|
3465 |
* File path parameters may contain non-ascii characters and should use
|
3416 |
the exact same byte values as found in the file system directory.
|
3466 |
the exact same byte values as found in the file system directory.
|
3417 |
Usually, this means that the configuration file should use the system
|
3467 |
Usually, this means that the configuration file should use the system
|
3418 |
default locale encoding.
|
3468 |
default locale encoding.
|
3419 |
|
3469 |
|
3420 |
o The unac_except_trans parameter should be encoded in UTF-8. If your
|
3470 |
* The unac_except_trans parameter should be encoded in UTF-8. If your
|
3421 |
system locale is not UTF-8, and you need to also specify non-ascii
|
3471 |
system locale is not UTF-8, and you need to also specify non-ascii
|
3422 |
file paths, this poses a difficulty because common text editors cannot
|
3472 |
file paths, this poses a difficulty because common text editors cannot
|
3423 |
handle multiple encodings in a single file. In this relatively
|
3473 |
handle multiple encodings in a single file. In this relatively
|
3424 |
unlikely case, you can edit the configuration file as two separate
|
3474 |
unlikely case, you can edit the configuration file as two separate
|
3425 |
text files with appropriate encodings, and concatenate them to create
|
3475 |
text files with appropriate encodings, and concatenate them to create
|
|
... |
|
... |
3570 |
indexing, or for all files inside the selected subtrees,
|
3620 |
indexing, or for all files inside the selected subtrees,
|
3571 |
independently of MIME type.
|
3621 |
independently of MIME type.
|
3572 |
|
3622 |
|
3573 |
usesystemfilecommand
|
3623 |
usesystemfilecommand
|
3574 |
|
3624 |
|
3575 |
Decide if we use the file -i system command as a final step for
|
3625 |
Decide if we execute a system command (file -i by default) as a
|
3576 |
determining the MIME type for a file (the main procedure uses
|
3626 |
final step for determining the MIME type for a file (the main
|
3577 |
suffix associations as defined in the mimemap file). This can be
|
3627 |
procedure uses suffix associations as defined in the mimemap
|
3578 |
useful for files with suffix-less names, but it will also cause
|
3628 |
file). This can be useful for files with suffix-less names, but it
|
3579 |
the indexing of many bogus "text" files.
|
3629 |
will also cause the indexing of many bogus "text" files.
|
|
|
3630 |
|
|
|
3631 |
systemfilecommand
|
|
|
3632 |
|
|
|
3633 |
Command to use for mime for mime type determination if
|
|
|
3634 |
usesystefilecommand is set. Recent versions of xdg-mime sometimes
|
|
|
3635 |
work better than file.
|
3580 |
|
3636 |
|
3581 |
processwebqueue
|
3637 |
processwebqueue
|
3582 |
|
3638 |
|
3583 |
If this is set, process the directory where Web browser plugins
|
3639 |
If this is set, process the directory where Web browser plugins
|
3584 |
copy visited pages for indexing.
|
3640 |
copy visited pages for indexing.
|
|
... |
|
... |
3996 |
The fields file has several sections, which each define an aspect of
|
4052 |
The fields file has several sections, which each define an aspect of
|
3997 |
fields processing. Quite often, you'll have to modify several sections to
|
4053 |
fields processing. Quite often, you'll have to modify several sections to
|
3998 |
obtain the desired behaviour.
|
4054 |
obtain the desired behaviour.
|
3999 |
|
4055 |
|
4000 |
We will only give a short description here, you should refer to the
|
4056 |
We will only give a short description here, you should refer to the
|
4001 |
comments inside the file for more detailed information.
|
4057 |
comments inside the default file for more detailed information.
|
4002 |
|
4058 |
|
4003 |
Field names should be lowercase alphabetic ASCII.
|
4059 |
Field names should be lowercase alphabetic ASCII.
|
4004 |
|
4060 |
|
4005 |
[prefixes]
|
4061 |
[prefixes]
|
4006 |
|
4062 |
|
|
... |
|
... |
4014 |
|
4070 |
|
4015 |
[aliases]
|
4071 |
[aliases]
|
4016 |
|
4072 |
|
4017 |
This section defines lists of synonyms for the canonical names
|
4073 |
This section defines lists of synonyms for the canonical names
|
4018 |
used inside the [prefixes] and [stored] sections
|
4074 |
used inside the [prefixes] and [stored] sections
|
|
|
4075 |
|
|
|
4076 |
[queryaliases]
|
|
|
4077 |
|
|
|
4078 |
This section also defines aliases for the canonic field names,
|
|
|
4079 |
with the difference that the substitution will only be used at
|
|
|
4080 |
query time, avoiding any possibility that the value would pick-up
|
|
|
4081 |
random metadata from documents.
|
4019 |
|
4082 |
|
4020 |
handler-specific sections
|
4083 |
handler-specific sections
|
4021 |
|
4084 |
|
4022 |
Some input handlers may need specific configuration for handling
|
4085 |
Some input handlers may need specific configuration for handling
|
4023 |
fields. Only the email message handler currently has such a
|
4086 |
fields. Only the email message handler currently has such a
|
|
... |
|
... |
4037 |
|
4100 |
|
4038 |
[stored]
|
4101 |
[stored]
|
4039 |
# Store mailmytag inside the document data record (so that it can be
|
4102 |
# Store mailmytag inside the document data record (so that it can be
|
4040 |
# displayed - as %(mailmytag) - in result lists).
|
4103 |
# displayed - as %(mailmytag) - in result lists).
|
4041 |
mailmytag =
|
4104 |
mailmytag =
|
|
|
4105 |
|
|
|
4106 |
[queryaliases]
|
|
|
4107 |
filename = fn
|
|
|
4108 |
containerfilename = cfn
|
4042 |
|
4109 |
|
4043 |
[mail]
|
4110 |
[mail]
|
4044 |
# Extract the X-My-Tag mail header, and use it internally with the
|
4111 |
# Extract the X-My-Tag mail header, and use it internally with the
|
4045 |
# mailmytag field name
|
4112 |
# mailmytag field name
|
4046 |
x-my-tag = mailmytag
|
4113 |
x-my-tag = mailmytag
|
|
... |
|
... |
4131 |
mydoc.doc.gz).
|
4198 |
mydoc.doc.gz).
|
4132 |
|
4199 |
|
4133 |
The right side of each assignment holds a command to be executed for
|
4200 |
The right side of each assignment holds a command to be executed for
|
4134 |
opening the file. The following substitutions are performed:
|
4201 |
opening the file. The following substitutions are performed:
|
4135 |
|
4202 |
|
4136 |
o %D. Document date
|
4203 |
* %D. Document date
|
4137 |
|
4204 |
|
4138 |
o %f. File name. This may be the name of a temporary file if it was
|
4205 |
* %f. File name. This may be the name of a temporary file if it was
|
4139 |
necessary to create one (ie: to extract a subdocument from a
|
4206 |
necessary to create one (ie: to extract a subdocument from a
|
4140 |
container).
|
4207 |
container).
|
4141 |
|
4208 |
|
4142 |
o %F. Original file name. Same as %f except if a temporary file is used.
|
|
|
4143 |
|
|
|
4144 |
o %i. Internal path, for subdocuments of containers. The format depends
|
4209 |
* %i. Internal path, for subdocuments of containers. The format depends
|
4145 |
on the container type. If this appears in the command line, Recoll
|
4210 |
on the container type. If this appears in the command line, Recoll
|
4146 |
will not create a temporary file to extract the subdocument, expecting
|
4211 |
will not create a temporary file to extract the subdocument, expecting
|
4147 |
the called application (possibly a script) to be able to handle it.
|
4212 |
the called application (possibly a script) to be able to handle it.
|
4148 |
|
4213 |
|
4149 |
o %M. MIME type
|
4214 |
* %M. MIME type
|
4150 |
|
4215 |
|
4151 |
o %p. Page index. Only significant for a subset of document types,
|
4216 |
* %p. Page index. Only significant for a subset of document types,
|
4152 |
currently only PDF, Postscript and DVI files. Can be used to start the
|
4217 |
currently only PDF, Postscript and DVI files. Can be used to start the
|
4153 |
editor at the right page for a match or snippet.
|
4218 |
editor at the right page for a match or snippet.
|
4154 |
|
4219 |
|
4155 |
o %s. Search term. The value will only be set for documents with indexed
|
4220 |
* %s. Search term. The value will only be set for documents with indexed
|
4156 |
page numbers (ie: PDF). The value will be one of the matched search
|
4221 |
page numbers (ie: PDF). The value will be one of the matched search
|
4157 |
terms. It would allow pre-setting the value in the "Find" entry inside
|
4222 |
terms. It would allow pre-setting the value in the "Find" entry inside
|
4158 |
Evince for example, for easy highlighting of the term.
|
4223 |
Evince for example, for easy highlighting of the term.
|
4159 |
|
4224 |
|
4160 |
o %U, %u. Url.
|
4225 |
* %u. Url.
|
4161 |
|
4226 |
|
4162 |
In addition to the predefined values above, all strings like %(fieldname)
|
4227 |
In addition to the predefined values above, all strings like %(fieldname)
|
4163 |
will be replaced by the value of the field named fieldname for the
|
4228 |
will be replaced by the value of the field named fieldname for the
|
4164 |
document. This could be used in combination with field customisation to
|
4229 |
document. This could be used in combination with field customisation to
|
4165 |
help with opening the document.
|
4230 |
help with opening the document.
|
|
... |
|
... |
4192 |
the result list (when found by file name). The file names end in .blob and
|
4257 |
the result list (when found by file name). The file names end in .blob and
|
4193 |
can be displayed by application blobviewer.
|
4258 |
can be displayed by application blobviewer.
|
4194 |
|
4259 |
|
4195 |
You need two entries in the configuration files for this to work:
|
4260 |
You need two entries in the configuration files for this to work:
|
4196 |
|
4261 |
|
4197 |
o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
4262 |
* In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
4198 |
following line:
|
4263 |
following line:
|
4199 |
|
4264 |
|
4200 |
.blob = application/x-blobapp
|
4265 |
.blob = application/x-blobapp
|
4201 |
|
4266 |
|
4202 |
Note that the MIME type is made up here, and you could call it
|
4267 |
Note that the MIME type is made up here, and you could call it
|
4203 |
diesel/oil just the same.
|
4268 |
diesel/oil just the same.
|
4204 |
|
4269 |
|
4205 |
o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
4270 |
* In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
4206 |
|
4271 |
|
4207 |
application/x-blobapp = blobviewer %f
|
4272 |
application/x-blobapp = blobviewer %f
|
4208 |
|
4273 |
|
4209 |
We are supposing that blobviewer wants a file name parameter here, you
|
4274 |
We are supposing that blobviewer wants a file name parameter here, you
|
4210 |
would use %u if it liked URLs better.
|
4275 |
would use %u if it liked URLs better.
|
|
... |
|
... |
4221 |
text and that you know how to extract it with a command line program.
|
4286 |
text and that you know how to extract it with a command line program.
|
4222 |
Getting Recoll to index the files is easy. You need to perform the above
|
4287 |
Getting Recoll to index the files is easy. You need to perform the above
|
4223 |
alteration, and also to add data to the mimeconf file (typically in
|
4288 |
alteration, and also to add data to the mimeconf file (typically in
|
4224 |
~/.recoll/mimeconf):
|
4289 |
~/.recoll/mimeconf):
|
4225 |
|
4290 |
|
4226 |
o Under the [index] section, add the following line (more about the
|
4291 |
* Under the [index] section, add the following line (more about the
|
4227 |
rclblob indexing script later):
|
4292 |
rclblob indexing script later):
|
4228 |
|
4293 |
|
4229 |
application/x-blobapp = exec rclblob
|
4294 |
application/x-blobapp = exec rclblob
|
4230 |
|
4295 |
|
4231 |
o Under the [icons] section, you should choose an icon to be displayed
|
4296 |
* Under the [icons] section, you should choose an icon to be displayed
|
4232 |
for the files inside the result lists. Icons are normally 64x64 pixels
|
4297 |
for the files inside the result lists. Icons are normally 64x64 pixels
|
4233 |
PNG files which live in /usr/[local/]share/recoll/images.
|
4298 |
PNG files which live in /usr/[local/]share/recoll/images.
|
4234 |
|
4299 |
|
4235 |
o Under the [categories] section, you should add the MIME type where it
|
4300 |
* Under the [categories] section, you should add the MIME type where it
|
4236 |
makes sense (you can also create a category). Categories may be used
|
4301 |
makes sense (you can also create a category). Categories may be used
|
4237 |
for filtering in advanced search.
|
4302 |
for filtering in advanced search.
|
4238 |
|
4303 |
|
4239 |
The rclblob handler should be an executable program or script which exists
|
4304 |
The rclblob handler should be an executable program or script which exists
|
4240 |
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
4305 |
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|