|
a/src/INSTALL |
|
b/src/INSTALL |
|
... |
|
... |
37 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
37 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
38 |
will be automatically satisfied, you should not have to worry about them.
|
38 |
will be automatically satisfied, you should not have to worry about them.
|
39 |
|
39 |
|
40 |
You will only have to check or install supporting applications for the
|
40 |
You will only have to check or install supporting applications for the
|
41 |
file types that you want to index beyond those that are natively processed
|
41 |
file types that you want to index beyond those that are natively processed
|
42 |
by Recoll (text, HTML, mail files, and a few others).
|
42 |
by Recoll (text, HTML, email files, and a few others).
|
43 |
|
43 |
|
44 |
You should also maybe have a look at the configuration section (but this
|
44 |
You should also maybe have a look at the configuration section (but this
|
45 |
may not be necessary for a quick test with default parameters). Most
|
45 |
may not be necessary for a quick test with default parameters). Most
|
46 |
parameters can be more conveniently set from the GUI interface.
|
46 |
parameters can be more conveniently set from the GUI interface.
|
47 |
|
47 |
|
|
... |
|
... |
167 |
|
167 |
|
168 |
* Midi karaoke files need Python and the Midi module
|
168 |
* Midi karaoke files need Python and the Midi module
|
169 |
|
169 |
|
170 |
* Konqueror webarchive format with Python (uses the Tarfile module).
|
170 |
* Konqueror webarchive format with Python (uses the Tarfile module).
|
171 |
|
171 |
|
172 |
* mimehtml web archive format (support based on the mail filter, which
|
172 |
* mimehtml web archive format (support based on the email filter, which
|
173 |
introduces some mild weirdness, but still usable).
|
173 |
introduces some mild weirdness, but still usable).
|
174 |
|
174 |
|
175 |
Text, HTML, mail folders, and Scribus files are processed internally. Lyx
|
175 |
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
176 |
is used to index Lyx files. Many filters need iconv and the standard sed
|
176 |
is used to index Lyx files. Many filters need iconv and the standard sed
|
177 |
and awk.
|
177 |
and awk.
|
178 |
|
178 |
|
179 |
--------------------------------------------------------------------------
|
179 |
--------------------------------------------------------------------------
|
180 |
|
180 |
|
|
... |
|
... |
393 |
expanded to the name of the user's home directory, as a shell would do.
|
393 |
expanded to the name of the user's home directory, as a shell would do.
|
394 |
|
394 |
|
395 |
White space is used for separation inside lists. List elements with
|
395 |
White space is used for separation inside lists. List elements with
|
396 |
embedded spaces can be quoted using double-quotes.
|
396 |
embedded spaces can be quoted using double-quotes.
|
397 |
|
397 |
|
|
|
398 |
Encoding issues. Most of the configuration parameters are plain ASCII. Two
|
|
|
399 |
particular sets of values may cause encoding issues:
|
|
|
400 |
|
|
|
401 |
* File path parameters may contain non-ascii characters and should use
|
|
|
402 |
the exact same byte values as found in the file system directory.
|
|
|
403 |
Usually, this means that the configuration file should use the system
|
|
|
404 |
default locale encoding.
|
|
|
405 |
|
|
|
406 |
* The unac_except_trans parameter should be encoded in UTF-8. If your
|
|
|
407 |
system locale is not UTF-8, and you need to also specify non-ascii
|
|
|
408 |
file paths, this poses a difficulty because common text editors cannot
|
|
|
409 |
handle multiple encodings in a single file. In this relatively
|
|
|
410 |
unlikely case, you can edit the configuration file as two separate
|
|
|
411 |
text files with appropriate encodings, and concatenate them to create
|
|
|
412 |
the complete configuration.
|
|
|
413 |
|
398 |
5.4.1. Main configuration file
|
414 |
5.4.1. Main configuration file
|
399 |
|
415 |
|
400 |
recoll.conf is the main configuration file. It defines things like what to
|
416 |
recoll.conf is the main configuration file. It defines things like what to
|
401 |
index (top directories and things to ignore), and the default character
|
417 |
index (top directories and things to ignore), and the default character
|
402 |
set to use for document types which do not specify it internally.
|
418 |
set to use for document types which do not specify it internally.
|
|
... |
|
... |
436 |
a directory in topdirs might match and would still be indexed).
|
452 |
a directory in topdirs might match and would still be indexed).
|
437 |
|
453 |
|
438 |
The list in the default configuration does not exclude hidden
|
454 |
The list in the default configuration does not exclude hidden
|
439 |
directories (names beginning with a dot), which means that it may
|
455 |
directories (names beginning with a dot), which means that it may
|
440 |
index quite a few things that you do not want. On the other hand,
|
456 |
index quite a few things that you do not want. On the other hand,
|
441 |
mail user agents like thunderbird usually store messages in hidden
|
457 |
email user agents like thunderbird usually store messages in
|
442 |
directories, and you probably want this indexed. One possible
|
458 |
hidden directories, and you probably want this indexed. One
|
443 |
solution is to have .* in skippedNames, and add things like
|
459 |
possible solution is to have .* in skippedNames, and add things
|
444 |
~/.thunderbird or ~/.evolution in topdirs.
|
460 |
like ~/.thunderbird or ~/.evolution in topdirs.
|
445 |
|
461 |
|
446 |
Not even the file names are indexed for patterns in this list. See
|
462 |
Not even the file names are indexed for patterns in this list. See
|
447 |
the recoll_noindex variable in mimemap for an alternative approach
|
463 |
the recoll_noindex variable in mimemap for an alternative approach
|
448 |
which indexes the file names.
|
464 |
which indexes the file names.
|
449 |
|
465 |
|
|
... |
|
... |
586 |
character set definition (ie: plain text files). This can be
|
602 |
character set definition (ie: plain text files). This can be
|
587 |
redefined for any sub-directory. If it is not set at all, the
|
603 |
redefined for any sub-directory. If it is not set at all, the
|
588 |
character set used is the one defined by the nls environment
|
604 |
character set used is the one defined by the nls environment
|
589 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
605 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
590 |
|
606 |
|
|
|
607 |
unac_except_trans
|
|
|
608 |
|
|
|
609 |
This is a list of characters, encoded in UTF-8, which should be
|
|
|
610 |
handled specially when converting text to unaccented lowercase.
|
|
|
611 |
For example, in Swedish, the letter a with diaeresis has full
|
|
|
612 |
alphabet citizenship and should not be turned into an a. Each
|
|
|
613 |
element in the space-separated list has the special character as
|
|
|
614 |
first element and the translation following. The handling of both
|
|
|
615 |
the lowercase and upper-case versions of a character should be
|
|
|
616 |
specified, as appartenance to the list will turn-off both standard
|
|
|
617 |
accent and case processing. Example for Swedish:
|
|
|
618 |
|
|
|
619 |
unac_except_trans = aaaa AAaa a:a: A:a: o:o: O:o:
|
|
|
620 |
|
|
|
621 |
|
|
|
622 |
Note that the translation is not limited to a single character,
|
|
|
623 |
you could very well have something like u:ue in the list.
|
|
|
624 |
|
|
|
625 |
This parameter can't be defined for subdirectories, it is global,
|
|
|
626 |
because there is no way to do otherwise when querying. If you have
|
|
|
627 |
document sets which would need different values, you will have to
|
|
|
628 |
index and query them separately.
|
|
|
629 |
|
591 |
maildefcharset
|
630 |
maildefcharset
|
592 |
|
631 |
|
593 |
This can be used to define the default character set specifically
|
632 |
This can be used to define the default character set specifically
|
594 |
for mail messages which don't specify it. This is mainly useful
|
633 |
for email messages which don't specify it. This is mainly useful
|
595 |
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
634 |
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
596 |
|
635 |
|
597 |
localfields
|
636 |
localfields
|
598 |
|
637 |
|
599 |
This allows setting fields for all documents under a given
|
638 |
This allows setting fields for all documents under a given
|
|
... |
|
... |
775 |
used inside the [prefixes] and [stored] sections
|
814 |
used inside the [prefixes] and [stored] sections
|
776 |
|
815 |
|
777 |
filter-specific sections
|
816 |
filter-specific sections
|
778 |
|
817 |
|
779 |
Some filters may need specific configuration for handling fields.
|
818 |
Some filters may need specific configuration for handling fields.
|
780 |
Only the mail message filter currently has such a section (named
|
819 |
Only the email message filter currently has such a section (named
|
781 |
[mail]). It allows indexing arbitrary mail headers in addition to
|
820 |
[mail]). It allows indexing arbitrary email headers in addition to
|
782 |
the ones indexed by default. Other such sections may appear in the
|
821 |
the ones indexed by default. Other such sections may appear in the
|
783 |
future.
|
822 |
future.
|
784 |
|
823 |
|
785 |
Here follows a small example of a personal fields file. This would extract
|
824 |
Here follows a small example of a personal fields file. This would extract
|
786 |
a specific mail header and use it as a searchable field, with data
|
825 |
a specific email header and use it as a searchable field, with data
|
787 |
displayable inside result lists. (Side note: as the mail filter does no
|
826 |
displayable inside result lists. (Side note: as the email filter does no
|
788 |
decoding on the values, only plain ascii headers can be indexed, and only
|
827 |
decoding on the values, only plain ascii headers can be indexed, and only
|
789 |
the first occurrence will be used for headers that occur several times).
|
828 |
the first occurrence will be used for headers that occur several times).
|
790 |
|
829 |
|
791 |
[prefixes]
|
830 |
[prefixes]
|
792 |
# Index mailmytag contents (with the given prefix)
|
831 |
# Index mailmytag contents (with the given prefix)
|