|
a/src/README |
|
b/src/README |
|
... |
|
... |
161 |
Chapter 1. Introduction
|
161 |
Chapter 1. Introduction
|
162 |
|
162 |
|
163 |
1.1. Giving it a try
|
163 |
1.1. Giving it a try
|
164 |
|
164 |
|
165 |
If you do not like reading manuals (who does?) and would like to give
|
165 |
If you do not like reading manuals (who does?) and would like to give
|
166 |
Recoll a try, just perform installation and start the recoll user
|
166 |
Recoll a try, just install the application and start the recoll graphical
|
167 |
interface, which will index your home directory by default, allowing you
|
167 |
user interface (GUI), which will ask to index your home directory by
|
168 |
to search immediately after indexing completes.
|
168 |
default, allowing you to search immediately after indexing completes.
|
169 |
|
169 |
|
170 |
Do not do this if your home directory contains a huge number of documents
|
170 |
Do not do this if your home directory contains a huge number of documents
|
171 |
and you do not want to wait or are very short on disk space. In this case,
|
171 |
and you do not want to wait or are very short on disk space. In this case,
|
172 |
you may first want to customize the configuration to restrict the indexed
|
172 |
you may first want to customize the configuration to restrict the indexed
|
173 |
area.
|
173 |
area.
|
|
... |
|
... |
265 |
default configuration will index your home directory with default
|
265 |
default configuration will index your home directory with default
|
266 |
parameters and should be sufficient for giving Recoll a try, but you may
|
266 |
parameters and should be sufficient for giving Recoll a try, but you may
|
267 |
want to adjust it later, which can be done either by editing the text
|
267 |
want to adjust it later, which can be done either by editing the text
|
268 |
files or by using configuration menus in the recoll GUI
|
268 |
files or by using configuration menus in the recoll GUI
|
269 |
|
269 |
|
270 |
Indexing is started automatically the first time you execute the recoll
|
270 |
The indexing process is started automatically the first time you execute
|
271 |
search graphical user interface, or by executing the recollindex command.
|
271 |
the recoll GUI. Indexing can also be performed by executing the
|
|
|
272 |
recollindex command.
|
272 |
|
273 |
|
273 |
Searches are usually performed inside the recoll graphical user interface
|
274 |
Searches are usually performed inside the recoll GUI, which has many
|
274 |
(GUI) program, which has many options to help you find what you are
|
275 |
options to help you find what you are looking for. However, there are
|
275 |
looking for. However, there are other ways to perform Recoll searches:
|
276 |
other ways to perform Recoll searches: mostly a command line interface, a
|
276 |
mostly a command line tool, a Python programming interface, and a KDE KIO
|
277 |
Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
|
277 |
slave module.
|
278 |
Lens module.
|
278 |
|
279 |
|
279 |
----------------------------------------------------------------------
|
280 |
----------------------------------------------------------------------
|
280 |
|
281 |
|
281 |
Chapter 2. Indexing
|
282 |
Chapter 2. Indexing
|
282 |
|
283 |
|
|
... |
|
... |
309 |
Recoll knows about quite a few different document types. The parameters
|
310 |
Recoll knows about quite a few different document types. The parameters
|
310 |
for document types recognition and processing are set in configuration
|
311 |
for document types recognition and processing are set in configuration
|
311 |
files.
|
312 |
files.
|
312 |
|
313 |
|
313 |
Most file types, like HTML or word processing files, only hold one
|
314 |
Most file types, like HTML or word processing files, only hold one
|
314 |
document. Some file types, like mail folder files or zip archives, can
|
315 |
document. Some file types, like email folders or zip archives, can hold
|
315 |
hold many individually indexed documents, which may in turn be themselves
|
316 |
many individually indexed documents, which may in turn be themselves
|
316 |
compound ones. Such hierarchies can go quite deep, and Recoll has no
|
317 |
compound ones. Such hierarchies can go quite deep, and Recoll can process,
|
317 |
problem processing, for example, an ms-word document which would be an
|
318 |
for example, an ms-word document stored as an attachment to an email
|
318 |
attachment to an email message part of a folder file archived inside a zip
|
319 |
message inside an email folder archived in a zip file...
|
319 |
file...
|
|
|
320 |
|
320 |
|
321 |
Recoll indexing processes plain text, HTML, openoffice and e-mail files,
|
321 |
Recoll indexing processes plain text, HTML, OpenDocument
|
322 |
and a few others internally.
|
322 |
(Open/LibreOffice), email formats, and a few others internally.
|
323 |
|
323 |
|
324 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
324 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
325 |
applications for preprocessing. The list is in the installation section.
|
325 |
applications for preprocessing. The list is in the installation section.
|
326 |
After every indexing operation, Recoll updates a list of commands that
|
326 |
After every indexing operation, Recoll updates a list of commands that
|
327 |
would be needed for indexing existing files types. This list can be
|
327 |
would be needed for indexing existing files types. This list can be
|
328 |
displayed from the recoll File menu. It is stored in the missing text file
|
328 |
displayed by selecting the menu option File->Show Missing Helpers in the
|
329 |
inside the configuration directory.
|
329 |
recoll GUI. It is stored in the missing text file inside the configuration
|
|
|
330 |
directory.
|
330 |
|
331 |
|
331 |
Without further configuration, Recoll will index all appropriate files
|
332 |
Without further configuration, Recoll will index all appropriate files
|
332 |
from your home directory, with a reasonable set of defaults.
|
333 |
from your home directory, with a reasonable set of defaults.
|
333 |
|
334 |
|
334 |
In some cases, it may be interesting to index different areas of the file
|
335 |
In some cases, it may be interesting to index different areas of the file
|
|
... |
|
... |
385 |
the documents. It may also be much smaller if the documents contain a lot
|
386 |
the documents. It may also be much smaller if the documents contain a lot
|
386 |
of images or other non-indexed data (an extreme example being a set of mp3
|
387 |
of images or other non-indexed data (an extreme example being a set of mp3
|
387 |
files where only the tags would be indexed).
|
388 |
files where only the tags would be indexed).
|
388 |
|
389 |
|
389 |
Of course, images, sound and video do not increase the index size, which
|
390 |
Of course, images, sound and video do not increase the index size, which
|
390 |
means that it will be quite typical nowadays (2006), that even a big index
|
391 |
means that nowadays (2012), typically, even a big index will be negligible
|
391 |
will be negligible against the total amount of data on the computer.
|
392 |
against the total amount of data on the computer.
|
392 |
|
393 |
|
393 |
The index data directory (xapiandb) only contains data that can be
|
394 |
The index data directory (xapiandb) only contains data that can be
|
394 |
completely rebuilt by an index run (as long as the original documents
|
395 |
completely rebuilt by an index run (as long as the original documents
|
395 |
exist), and it can always be destroyed safely.
|
396 |
exist), and it can always be destroyed safely.
|
396 |
|
397 |
|
|
... |
|
... |
466 |
|
467 |
|
467 |
Most parameters for a given indexing configuration can be set from a
|
468 |
Most parameters for a given indexing configuration can be set from a
|
468 |
recoll GUI running on this configuration (either as default, or by setting
|
469 |
recoll GUI running on this configuration (either as default, or by setting
|
469 |
RECOLL_CONFDIR or the -c option.)
|
470 |
RECOLL_CONFDIR or the -c option.)
|
470 |
|
471 |
|
471 |
The interface is started from the Preferences menu. It has two main
|
472 |
The interface is started from the Preferences->Indexing Configuration menu
|
|
|
473 |
entry. It is divided in three tabs, Global parameters, Local parameters,
|
|
|
474 |
and Beagle web history, which is explained in the next section.
|
|
|
475 |
|
472 |
panels. The first panel allows setting global variables, like the list of
|
476 |
The first tab allows setting global variables, like the lists of top
|
473 |
top directories or the list of skipped paths. The second panel allows
|
477 |
directories, skipped paths, or stemming languages.
|
474 |
setting variables that can be redefined for subdirectories. This second
|
478 |
|
475 |
panel has an initially empty list of customisation directories, to which
|
479 |
The second tab allows setting variables that can be redefined for
|
476 |
you can add. The variables are then set for the currently selected
|
480 |
subdirectories. This second tab has an initially empty list of
|
477 |
directory (or at the top level if the empty line is selected).
|
481 |
customisation directories, to which you can add. The variables are then
|
|
|
482 |
set for the currently selected directory (or at the top level if the empty
|
|
|
483 |
line is selected).
|
478 |
|
484 |
|
479 |
The meaning for most entries in the interface is self-evident and
|
485 |
The meaning for most entries in the interface is self-evident and
|
480 |
documented by a ToolTip popup on the text label. For more detail, you will
|
486 |
documented by a ToolTip popup on the text label. For more detail, you will
|
481 |
need to refer to the configuration section of this guide.
|
487 |
need to refer to the configuration section of this guide.
|
482 |
|
488 |
|
|
... |
|
... |
527 |
|
533 |
|
528 |
If the recoll program finds no index when it starts, it will automatically
|
534 |
If the recoll program finds no index when it starts, it will automatically
|
529 |
start indexing (except if canceled).
|
535 |
start indexing (except if canceled).
|
530 |
|
536 |
|
531 |
The recollindex indexing process can be interrupted by sending an
|
537 |
The recollindex indexing process can be interrupted by sending an
|
532 |
interrupt (^C, SIGINT) or terminate (SIGTERM) signal. Some time may elapse
|
538 |
interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may
|
533 |
before the process exits, because it needs to properly flush and close the
|
539 |
elapse before the process exits, because it needs to properly flush and
|
534 |
index. The indexing thread can be equivalently stopped from the menu.
|
540 |
close the index. This can also be done from the recoll GUI File->Stop
|
|
|
541 |
Indexing menu entry.
|
535 |
|
542 |
|
536 |
After such an interruption, the index will be somewhat inconsistent
|
543 |
After such an interruption, the index will be somewhat inconsistent
|
537 |
because some operations which are normally performed at the end of the
|
544 |
because some operations which are normally performed at the end of the
|
538 |
indexing pass will have been skipped (for exemple, the stemming and
|
545 |
indexing pass will have been skipped (for example, the stemming and
|
539 |
spelling databases will be inexistant or out of date). You just need to
|
546 |
spelling databases will be inexistant or out of date). You just need to
|
540 |
restart indexing at a later time to restore consistency. The indexing will
|
547 |
restart indexing at a later time to restore consistency. The indexing will
|
541 |
restart at the interruption point (the full file tree will be traversed,
|
548 |
restart at the interruption point (the full file tree will be traversed,
|
542 |
but files that were indexed up to the interruption and are still up to
|
549 |
but files that were indexed up to the interruption and are still up to
|
543 |
date will not need to be reindexed).
|
550 |
date will not need to be reindexed).
|
|
... |
|
... |
675 |
toolbox bar icon) has multiple entry fields, which you may use to
|
682 |
toolbox bar icon) has multiple entry fields, which you may use to
|
676 |
build a logical condition, with additional filtering on file type and
|
683 |
build a logical condition, with additional filtering on file type and
|
677 |
location in the file system.
|
684 |
location in the file system.
|
678 |
|
685 |
|
679 |
In most cases, you can enter the terms as you think them, even if they
|
686 |
In most cases, you can enter the terms as you think them, even if they
|
680 |
contain embedded punctuation or other non-textual characters. For exemple,
|
687 |
contain embedded punctuation or other non-textual characters. For example,
|
681 |
Recoll can handle things like e-mail addresses, or arbitrary cut and paste
|
688 |
Recoll can handle things like email addresses, or arbitrary cut and paste
|
682 |
from another text window, punctation and all.
|
689 |
from another text window, punctation and all.
|
683 |
|
690 |
|
684 |
The main case where you should enter text differently from how it is
|
691 |
The main case where you should enter text differently from how it is
|
685 |
printed is for east-asian languages (Chinese, Japanese, Korean). Words
|
692 |
printed is for east-asian languages (Chinese, Japanese, Korean). Words
|
686 |
composed of single or multiple characters should be entered separated by
|
693 |
composed of single or multiple characters should be entered separated by
|
|
... |
|
... |
861 |
This entry is mainly useful for email attachments and permits viewing the
|
868 |
This entry is mainly useful for email attachments and permits viewing the
|
862 |
message to which the document is attached. Note that the entry will also
|
869 |
message to which the document is attached. Note that the entry will also
|
863 |
appear for an email which is part of an mbox folder file, but that you
|
870 |
appear for an email which is part of an mbox folder file, but that you
|
864 |
can't actually visualize the folder (there will be an error dialog if you
|
871 |
can't actually visualize the folder (there will be an error dialog if you
|
865 |
try). Recoll is unfortunately not yet smart enough to disable the entry in
|
872 |
try). Recoll is unfortunately not yet smart enough to disable the entry in
|
866 |
this case. In other cases, the Open option makes sense, for exemple to
|
873 |
this case. In other cases, the Open option makes sense, for example to
|
867 |
start a chm viewer on the parent document for a help page.
|
874 |
start a chm viewer on the parent document for a help page.
|
868 |
|
875 |
|
869 |
----------------------------------------------------------------------
|
876 |
----------------------------------------------------------------------
|
870 |
|
877 |
|
871 |
3.1.3. The result table
|
878 |
3.1.3. The result table
|
|
... |
|
... |
905 |
will open a new window for side by side viewing).
|
912 |
will open a new window for side by side viewing).
|
906 |
|
913 |
|
907 |
Starting another search and requesting a preview will create a new preview
|
914 |
Starting another search and requesting a preview will create a new preview
|
908 |
window. The old one stays open until you close it.
|
915 |
window. The old one stays open until you close it.
|
909 |
|
916 |
|
910 |
You can close a preview tab by typing ^W (Ctrl + W) in the window. Closing
|
917 |
You can close a preview tab by typing Ctrl-W (Ctrl + W) in the window.
|
911 |
the last tab for a window will also close the window.
|
918 |
Closing the last tab for a window will also close the window.
|
912 |
|
919 |
|
913 |
Of course you can also close a preview window by using the window manager
|
920 |
Of course you can also close a preview window by using the window manager
|
914 |
button in the top of the frame.
|
921 |
button in the top of the frame.
|
915 |
|
922 |
|
916 |
You can display successive or previous documents from the result list
|
923 |
You can display successive or previous documents from the result list
|
|
... |
|
... |
922 |
area or by clicking into the Search for: text field and entering the
|
929 |
area or by clicking into the Search for: text field and entering the
|
923 |
search string. You can then use the Next and Previous buttons to find the
|
930 |
search string. You can then use the Next and Previous buttons to find the
|
924 |
next/previous occurrence. You can also type F3 inside the text area to get
|
931 |
next/previous occurrence. You can also type F3 inside the text area to get
|
925 |
to the next occurrence.
|
932 |
to the next occurrence.
|
926 |
|
933 |
|
927 |
If you have a search string entered and you use ^Up/^Down to browse the
|
934 |
If you have a search string entered and you use Ctrl-Up/Ctrl-Down to
|
928 |
results, the search is initiated for each successive document. If the
|
935 |
browse the results, the search is initiated for each successive document.
|
929 |
string is found, the cursor will be positioned at the first occurrence of
|
936 |
If the string is found, the cursor will be positioned at the first
|
930 |
the search string.
|
937 |
occurrence of the search string.
|
931 |
|
938 |
|
932 |
A right-click menu in the text area allows switching between displaying
|
939 |
A right-click menu in the text area allows switching between displaying
|
933 |
the main text or the contents of fields associated to the document (ie:
|
940 |
the main text or the contents of fields associated to the document (ie:
|
934 |
author, abtract, etc.). This is especially useful in cases where the term
|
941 |
author, abtract, etc.). This is especially useful in cases where the term
|
935 |
match did not occur in the main text but in one of the fields.
|
942 |
match did not occur in the main text but in one of the fields.
|
936 |
|
943 |
|
937 |
You can print the current preview window contents by typing ^P (Ctrl + P)
|
944 |
You can print the current preview window contents by typing Ctrl-P (Ctrl +
|
938 |
in the window text.
|
945 |
P) in the window text.
|
939 |
|
946 |
|
940 |
----------------------------------------------------------------------
|
947 |
----------------------------------------------------------------------
|
941 |
|
948 |
|
942 |
3.1.5. Complex/advanced search
|
949 |
3.1.5. Complex/advanced search
|
943 |
|
950 |
|
|
... |
|
... |
1279 |
|
1286 |
|
1280 |
Forced opening of a preview window. You can use Shift+Click on a result
|
1287 |
Forced opening of a preview window. You can use Shift+Click on a result
|
1281 |
list Preview link to force the creation of a preview window instead of a
|
1288 |
list Preview link to force the creation of a preview window instead of a
|
1282 |
new tab in the existing one.
|
1289 |
new tab in the existing one.
|
1283 |
|
1290 |
|
1284 |
Closing previews. Entering ^W in a tab will close it (and, for the last
|
1291 |
Closing previews. Entering Ctrl-W in a tab will close it (and, for the
|
1285 |
tab, close the preview window). Entering Esc will close the preview window
|
1292 |
last tab, close the preview window). Entering Esc will close the preview
|
1286 |
and all its tabs.
|
1293 |
window and all its tabs.
|
1287 |
|
1294 |
|
1288 |
Printing previews. Entering ^P in a preview window will print the
|
1295 |
Printing previews. Entering Ctrl-P in a preview window will print the
|
1289 |
currently displayed text.
|
1296 |
currently displayed text.
|
1290 |
|
1297 |
|
1291 |
Quitting. Entering ^Q almost anywhere will close the application.
|
1298 |
Quitting. Entering Ctrl-Q almost anywhere will close the application.
|
1292 |
|
1299 |
|
1293 |
----------------------------------------------------------------------
|
1300 |
----------------------------------------------------------------------
|
1294 |
|
1301 |
|
1295 |
3.1.11. Customizing the search interface
|
1302 |
3.1.11. Customizing the search interface
|
1296 |
|
1303 |
|
|
... |
|
... |
1310 |
|
1317 |
|
1311 |
* Style sheet: The name of a Qt style sheet text file which is applied
|
1318 |
* Style sheet: The name of a Qt style sheet text file which is applied
|
1312 |
to the whole Recoll application on startup. The default value is
|
1319 |
to the whole Recoll application on startup. The default value is
|
1313 |
empty, but there is a skeleton style sheet (recoll.qss) inside the
|
1320 |
empty, but there is a skeleton style sheet (recoll.qss) inside the
|
1314 |
/usr/share/recoll/examples directory. Using a style sheet, you can
|
1321 |
/usr/share/recoll/examples directory. Using a style sheet, you can
|
1315 |
change most Recoll graphical parameters: colors, fonts, etc. See the
|
1322 |
change most recoll graphical parameters: colors, fonts, etc. See the
|
1316 |
sample file for a few simple examples.
|
1323 |
sample file for a few simple examples.
|
1317 |
|
1324 |
|
1318 |
* Maximum text size highlighted for preview Inserting highlights on
|
1325 |
* Maximum text size highlighted for preview Inserting highlights on
|
1319 |
search term inside the text before inserting it in the preview window
|
1326 |
search term inside the text before inserting it in the preview window
|
1320 |
involves quite a lot of processing, and can be disabled over the given
|
1327 |
involves quite a lot of processing, and can be disabled over the given
|
|
... |
|
... |
1465 |
displayed.
|
1472 |
displayed.
|
1466 |
|
1473 |
|
1467 |
No more detail will be given about the header part (only useful with the
|
1474 |
No more detail will be given about the header part (only useful with the
|
1468 |
WebKit build), if there are restrictions to what you can do, they are
|
1475 |
WebKit build), if there are restrictions to what you can do, they are
|
1469 |
beyond this author's HTML/CSS/Javascript abilities... There are a few
|
1476 |
beyond this author's HTML/CSS/Javascript abilities... There are a few
|
1470 |
exemples on the page about customising the result list on the Recoll web
|
1477 |
examples on the page about customising the result list on the Recoll web
|
1471 |
site.
|
1478 |
site.
|
1472 |
|
1479 |
|
1473 |
----------------------------------------------------------------------
|
1480 |
----------------------------------------------------------------------
|
1474 |
|
1481 |
|
1475 |
3.1.11.1.1. The paragraph format
|
1482 |
3.1.11.1.1. The paragraph format
|
|
... |
|
... |
1700 |
ie: the From: header, for an email message), and containing either beatles
|
1707 |
ie: the From: header, for an email message), and containing either beatles
|
1701 |
or lennon and either live or unplugged but not potatoes (in any part of
|
1708 |
or lennon and either live or unplugged but not potatoes (in any part of
|
1702 |
the document).
|
1709 |
the document).
|
1703 |
|
1710 |
|
1704 |
An element is composed of an optional field specification, and a value,
|
1711 |
An element is composed of an optional field specification, and a value,
|
1705 |
separated by a colon. Exemple: Beatles, author:balzac, dc:title:grandet
|
1712 |
separated by a colon. Example: Beatles, author:balzac, dc:title:grandet
|
1706 |
|
1713 |
|
1707 |
The colon, if present, means "contains". Xesam defines other relations,
|
1714 |
The colon, if present, means "contains". Xesam defines other relations,
|
1708 |
which are not supported for now.
|
1715 |
which are not supported for now.
|
1709 |
|
1716 |
|
1710 |
All elements in the search entry are normally combined with an implicit
|
1717 |
All elements in the search entry are normally combined with an implicit
|
|
... |
|
... |
1719 |
|
1726 |
|
1720 |
As usual, words inside quotes define a phrase (the order of words is
|
1727 |
As usual, words inside quotes define a phrase (the order of words is
|
1721 |
significant), so that title:"prejudice pride" is not the same as
|
1728 |
significant), so that title:"prejudice pride" is not the same as
|
1722 |
title:prejudice title:pride, and is unlikely to find a result.
|
1729 |
title:prejudice title:pride, and is unlikely to find a result.
|
1723 |
|
1730 |
|
1724 |
Modifiers can be set on a phrase clause, for exemple to specify a
|
1731 |
Modifiers can be set on a phrase clause, for example to specify a
|
1725 |
proximity search (unordered). See the modifier section.
|
1732 |
proximity search (unordered). See the modifier section.
|
1726 |
|
1733 |
|
1727 |
Recoll currently manages the following default fields:
|
1734 |
Recoll currently manages the following default fields:
|
1728 |
|
1735 |
|
1729 |
* title, subject or caption are synonyms which specify data to be
|
1736 |
* title, subject or caption are synonyms which specify data to be
|
|
... |
|
... |
1749 |
field and only one value makes sense in a query (you can't use
|
1756 |
field and only one value makes sense in a query (you can't use
|
1750 |
dir:dir1 OR dir:dir2). Relative paths make sense, for example,
|
1757 |
dir:dir1 OR dir:dir2). Relative paths make sense, for example,
|
1751 |
dir:share/doc would match either /usr/share/doc or
|
1758 |
dir:share/doc would match either /usr/share/doc or
|
1752 |
/usr/local/share/doc
|
1759 |
/usr/local/share/doc
|
1753 |
|
1760 |
|
1754 |
* size for filtering the results on file size. Exemple: size<10000. You
|
1761 |
* size for filtering the results on file size. Example: size<10000. You
|
1755 |
can use <, > or = as operators. You can specify a range like the
|
1762 |
can use <, > or = as operators. You can specify a range like the
|
1756 |
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
1763 |
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
1757 |
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
1764 |
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
1758 |
than 1000 bytes.
|
1765 |
than 1000 bytes.
|
1759 |
|
1766 |
|
|
... |
|
... |
1764 |
time. Periods are specified as PnYnMnD. The n numbers are the
|
1771 |
time. Periods are specified as PnYnMnD. The n numbers are the
|
1765 |
respective numbers of years, months or days, any of which may be
|
1772 |
respective numbers of years, months or days, any of which may be
|
1766 |
missing. Dates are specified as YYYY-MM-DD. The days and months parts
|
1773 |
missing. Dates are specified as YYYY-MM-DD. The days and months parts
|
1767 |
may be missing. If the / is present but an element is missing, the
|
1774 |
may be missing. If the / is present but an element is missing, the
|
1768 |
missing element is interpreted as the lowest or highest date in the
|
1775 |
missing element is interpreted as the lowest or highest date in the
|
1769 |
index. Exemples:
|
1776 |
index. Examples:
|
1770 |
|
1777 |
|
1771 |
* 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
|
1778 |
* 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
|
1772 |
|
1779 |
|
1773 |
* 2001-03-01/P1Y2M the same specified with a period.
|
1780 |
* 2001-03-01/P1Y2M the same specified with a period.
|
1774 |
|
1781 |
|
|
... |
|
... |
2007 |
the filter if the operation is for indexing or previewing. Some filters
|
2014 |
the filter if the operation is for indexing or previewing. Some filters
|
2008 |
use this to output a slightly different format, for example stripping
|
2015 |
use this to output a slightly different format, for example stripping
|
2009 |
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
2016 |
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
2010 |
This is not essential.
|
2017 |
This is not essential.
|
2011 |
|
2018 |
|
2012 |
You should look to one of the simple filters, for exemple rclps for a
|
2019 |
You should look to one of the simple filters, for example rclps for a
|
2013 |
starting point.
|
2020 |
starting point.
|
2014 |
|
2021 |
|
2015 |
Don't forget to make your filter executable before testing !
|
2022 |
Don't forget to make your filter executable before testing !
|
2016 |
|
2023 |
|
2017 |
----------------------------------------------------------------------
|
2024 |
----------------------------------------------------------------------
|
|
... |
|
... |
2435 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
2442 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
2436 |
will be automatically satisfied, you should not have to worry about them.
|
2443 |
will be automatically satisfied, you should not have to worry about them.
|
2437 |
|
2444 |
|
2438 |
You will only have to check or install supporting applications for the
|
2445 |
You will only have to check or install supporting applications for the
|
2439 |
file types that you want to index beyond those that are natively processed
|
2446 |
file types that you want to index beyond those that are natively processed
|
2440 |
by Recoll (text, HTML, mail files, and a few others).
|
2447 |
by Recoll (text, HTML, email files, and a few others).
|
2441 |
|
2448 |
|
2442 |
You should also maybe have a look at the configuration section (but this
|
2449 |
You should also maybe have a look at the configuration section (but this
|
2443 |
may not be necessary for a quick test with default parameters). Most
|
2450 |
may not be necessary for a quick test with default parameters). Most
|
2444 |
parameters can be more conveniently set from the GUI interface.
|
2451 |
parameters can be more conveniently set from the GUI interface.
|
2445 |
|
2452 |
|
|
... |
|
... |
2557 |
|
2564 |
|
2558 |
* Midi karaoke files need Python and the Midi module
|
2565 |
* Midi karaoke files need Python and the Midi module
|
2559 |
|
2566 |
|
2560 |
* Konqueror webarchive format with Python (uses the Tarfile module).
|
2567 |
* Konqueror webarchive format with Python (uses the Tarfile module).
|
2561 |
|
2568 |
|
2562 |
* mimehtml web archive format (support based on the mail filter, which
|
2569 |
* mimehtml web archive format (support based on the email filter, which
|
2563 |
introduces some mild weirdness, but still usable).
|
2570 |
introduces some mild weirdness, but still usable).
|
2564 |
|
2571 |
|
2565 |
Text, HTML, mail folders, and Scribus files are processed internally. Lyx
|
2572 |
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
2566 |
is used to index Lyx files. Many filters need iconv and the standard sed
|
2573 |
is used to index Lyx files. Many filters need iconv and the standard sed
|
2567 |
and awk.
|
2574 |
and awk.
|
2568 |
|
2575 |
|
2569 |
----------------------------------------------------------------------
|
2576 |
----------------------------------------------------------------------
|
2570 |
|
2577 |
|
|
... |
|
... |
2764 |
expanded to the name of the user's home directory, as a shell would do.
|
2771 |
expanded to the name of the user's home directory, as a shell would do.
|
2765 |
|
2772 |
|
2766 |
White space is used for separation inside lists. List elements with
|
2773 |
White space is used for separation inside lists. List elements with
|
2767 |
embedded spaces can be quoted using double-quotes.
|
2774 |
embedded spaces can be quoted using double-quotes.
|
2768 |
|
2775 |
|
|
|
2776 |
Encoding issues. Most of the configuration parameters are plain ASCII. Two
|
|
|
2777 |
particular sets of values may cause encoding issues:
|
|
|
2778 |
|
|
|
2779 |
* File path parameters may contain non-ascii characters and should use
|
|
|
2780 |
the exact same byte values as found in the file system directory.
|
|
|
2781 |
Usually, this means that the configuration file should use the system
|
|
|
2782 |
default locale encoding.
|
|
|
2783 |
|
|
|
2784 |
* The unac_except_trans parameter should be encoded in UTF-8. If your
|
|
|
2785 |
system locale is not UTF-8, and you need to also specify non-ascii
|
|
|
2786 |
file paths, this poses a difficulty because common text editors cannot
|
|
|
2787 |
handle multiple encodings in a single file. In this relatively
|
|
|
2788 |
unlikely case, you can edit the configuration file as two separate
|
|
|
2789 |
text files with appropriate encodings, and concatenate them to create
|
|
|
2790 |
the complete configuration.
|
|
|
2791 |
|
2769 |
----------------------------------------------------------------------
|
2792 |
----------------------------------------------------------------------
|
2770 |
|
2793 |
|
2771 |
5.4.1. Main configuration file
|
2794 |
5.4.1. Main configuration file
|
2772 |
|
2795 |
|
2773 |
recoll.conf is the main configuration file. It defines things like what to
|
2796 |
recoll.conf is the main configuration file. It defines things like what to
|
|
... |
|
... |
2811 |
a directory in topdirs might match and would still be indexed).
|
2834 |
a directory in topdirs might match and would still be indexed).
|
2812 |
|
2835 |
|
2813 |
The list in the default configuration does not exclude hidden
|
2836 |
The list in the default configuration does not exclude hidden
|
2814 |
directories (names beginning with a dot), which means that it may
|
2837 |
directories (names beginning with a dot), which means that it may
|
2815 |
index quite a few things that you do not want. On the other hand,
|
2838 |
index quite a few things that you do not want. On the other hand,
|
2816 |
mail user agents like thunderbird usually store messages in hidden
|
2839 |
email user agents like thunderbird usually store messages in
|
2817 |
directories, and you probably want this indexed. One possible
|
2840 |
hidden directories, and you probably want this indexed. One
|
2818 |
solution is to have .* in skippedNames, and add things like
|
2841 |
possible solution is to have .* in skippedNames, and add things
|
2819 |
~/.thunderbird or ~/.evolution in topdirs.
|
2842 |
like ~/.thunderbird or ~/.evolution in topdirs.
|
2820 |
|
2843 |
|
2821 |
Not even the file names are indexed for patterns in this list. See
|
2844 |
Not even the file names are indexed for patterns in this list. See
|
2822 |
the recoll_noindex variable in mimemap for an alternative approach
|
2845 |
the recoll_noindex variable in mimemap for an alternative approach
|
2823 |
which indexes the file names.
|
2846 |
which indexes the file names.
|
2824 |
|
2847 |
|
|
... |
|
... |
2963 |
character set definition (ie: plain text files). This can be
|
2986 |
character set definition (ie: plain text files). This can be
|
2964 |
redefined for any sub-directory. If it is not set at all, the
|
2987 |
redefined for any sub-directory. If it is not set at all, the
|
2965 |
character set used is the one defined by the nls environment
|
2988 |
character set used is the one defined by the nls environment
|
2966 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
2989 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
2967 |
|
2990 |
|
|
|
2991 |
unac_except_trans
|
|
|
2992 |
|
|
|
2993 |
This is a list of characters, encoded in UTF-8, which should be
|
|
|
2994 |
handled specially when converting text to unaccented lowercase.
|
|
|
2995 |
For example, in Swedish, the letter a with diaeresis has full
|
|
|
2996 |
alphabet citizenship and should not be turned into an a. Each
|
|
|
2997 |
element in the space-separated list has the special character as
|
|
|
2998 |
first element and the translation following. The handling of both
|
|
|
2999 |
the lowercase and upper-case versions of a character should be
|
|
|
3000 |
specified, as appartenance to the list will turn-off both standard
|
|
|
3001 |
accent and case processing. Example for Swedish:
|
|
|
3002 |
|
|
|
3003 |
unac_except_trans = aaaa AAaa a:a: A:a: o:o: O:o:
|
|
|
3004 |
|
|
|
3005 |
|
|
|
3006 |
Note that the translation is not limited to a single character,
|
|
|
3007 |
you could very well have something like u:ue in the list.
|
|
|
3008 |
|
|
|
3009 |
This parameter can't be defined for subdirectories, it is global,
|
|
|
3010 |
because there is no way to do otherwise when querying. If you have
|
|
|
3011 |
document sets which would need different values, you will have to
|
|
|
3012 |
index and query them separately.
|
|
|
3013 |
|
2968 |
maildefcharset
|
3014 |
maildefcharset
|
2969 |
|
3015 |
|
2970 |
This can be used to define the default character set specifically
|
3016 |
This can be used to define the default character set specifically
|
2971 |
for mail messages which don't specify it. This is mainly useful
|
3017 |
for email messages which don't specify it. This is mainly useful
|
2972 |
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
3018 |
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
2973 |
|
3019 |
|
2974 |
localfields
|
3020 |
localfields
|
2975 |
|
3021 |
|
2976 |
This allows setting fields for all documents under a given
|
3022 |
This allows setting fields for all documents under a given
|
|
... |
|
... |
3158 |
used inside the [prefixes] and [stored] sections
|
3204 |
used inside the [prefixes] and [stored] sections
|
3159 |
|
3205 |
|
3160 |
filter-specific sections
|
3206 |
filter-specific sections
|
3161 |
|
3207 |
|
3162 |
Some filters may need specific configuration for handling fields.
|
3208 |
Some filters may need specific configuration for handling fields.
|
3163 |
Only the mail message filter currently has such a section (named
|
3209 |
Only the email message filter currently has such a section (named
|
3164 |
[mail]). It allows indexing arbitrary mail headers in addition to
|
3210 |
[mail]). It allows indexing arbitrary email headers in addition to
|
3165 |
the ones indexed by default. Other such sections may appear in the
|
3211 |
the ones indexed by default. Other such sections may appear in the
|
3166 |
future.
|
3212 |
future.
|
3167 |
|
3213 |
|
3168 |
Here follows a small example of a personal fields file. This would extract
|
3214 |
Here follows a small example of a personal fields file. This would extract
|
3169 |
a specific mail header and use it as a searchable field, with data
|
3215 |
a specific email header and use it as a searchable field, with data
|
3170 |
displayable inside result lists. (Side note: as the mail filter does no
|
3216 |
displayable inside result lists. (Side note: as the email filter does no
|
3171 |
decoding on the values, only plain ascii headers can be indexed, and only
|
3217 |
decoding on the values, only plain ascii headers can be indexed, and only
|
3172 |
the first occurrence will be used for headers that occur several times).
|
3218 |
the first occurrence will be used for headers that occur several times).
|
3173 |
|
3219 |
|
3174 |
[prefixes]
|
3220 |
[prefixes]
|
3175 |
# Index mailmytag contents (with the given prefix)
|
3221 |
# Index mailmytag contents (with the given prefix)
|