Switch to unified view

a/src/INSTALL b/src/INSTALL
...
...
37
   In all cases, the strict software dependancies (ie on Xapian or iconv)
37
   In all cases, the strict software dependancies (ie on Xapian or iconv)
38
   will be automatically satisfied, you should not have to worry about them.
38
   will be automatically satisfied, you should not have to worry about them.
39
39
40
   You will only have to check or install supporting applications for the
40
   You will only have to check or install supporting applications for the
41
   file types that you want to index beyond those that are natively processed
41
   file types that you want to index beyond those that are natively processed
42
   by Recoll (text, HTML, mail files, and a few others).
42
   by Recoll (text, HTML, email files, and a few others).
43
43
44
   You should also maybe have a look at the configuration section (but this
44
   You should also maybe have a look at the configuration section (but this
45
   may not be necessary for a quick test with default parameters). Most
45
   may not be necessary for a quick test with default parameters). Most
46
   parameters can be more conveniently set from the GUI interface.
46
   parameters can be more conveniently set from the GUI interface.
47
47
...
...
167
167
168
     * Midi karaoke files need Python and the Midi module
168
     * Midi karaoke files need Python and the Midi module
169
169
170
     * Konqueror webarchive format with Python (uses the Tarfile module).
170
     * Konqueror webarchive format with Python (uses the Tarfile module).
171
171
172
     * mimehtml web archive format (support based on the mail filter, which
172
     * mimehtml web archive format (support based on the email filter, which
173
       introduces some mild weirdness, but still usable).
173
       introduces some mild weirdness, but still usable).
174
174
175
   Text, HTML, mail folders, and Scribus files are processed internally. Lyx
175
   Text, HTML, email folders, and Scribus files are processed internally. Lyx
176
   is used to index Lyx files. Many filters need iconv and the standard sed
176
   is used to index Lyx files. Many filters need iconv and the standard sed
177
   and awk.
177
   and awk.
178
178
179
   --------------------------------------------------------------------------
179
   --------------------------------------------------------------------------
180
180
...
...
393
   expanded to the name of the user's home directory, as a shell would do.
393
   expanded to the name of the user's home directory, as a shell would do.
394
394
395
   White space is used for separation inside lists. List elements with
395
   White space is used for separation inside lists. List elements with
396
   embedded spaces can be quoted using double-quotes.
396
   embedded spaces can be quoted using double-quotes.
397
397
398
   Encoding issues. Most of the configuration parameters are plain ASCII. Two
399
   particular sets of values may cause encoding issues:
400
401
     * File path parameters may contain non-ascii characters and should use
402
       the exact same byte values as found in the file system directory.
403
       Usually, this means that the configuration file should use the system
404
       default locale encoding.
405
406
     * The unac_except_trans parameter should be encoded in UTF-8. If your
407
       system locale is not UTF-8, and you need to also specify non-ascii
408
       file paths, this poses a difficulty because common text editors cannot
409
       handle multiple encodings in a single file. In this relatively
410
       unlikely case, you can edit the configuration file as two separate
411
       text files with appropriate encodings, and concatenate them to create
412
       the complete configuration.
413
398
5.4.1. Main configuration file
414
5.4.1. Main configuration file
399
415
400
   recoll.conf is the main configuration file. It defines things like what to
416
   recoll.conf is the main configuration file. It defines things like what to
401
   index (top directories and things to ignore), and the default character
417
   index (top directories and things to ignore), and the default character
402
   set to use for document types which do not specify it internally.
418
   set to use for document types which do not specify it internally.
...
...
436
           a directory in topdirs might match and would still be indexed).
452
           a directory in topdirs might match and would still be indexed).
437
453
438
           The list in the default configuration does not exclude hidden
454
           The list in the default configuration does not exclude hidden
439
           directories (names beginning with a dot), which means that it may
455
           directories (names beginning with a dot), which means that it may
440
           index quite a few things that you do not want. On the other hand,
456
           index quite a few things that you do not want. On the other hand,
441
           mail user agents like thunderbird usually store messages in hidden
457
           email user agents like thunderbird usually store messages in
442
           directories, and you probably want this indexed. One possible
458
           hidden directories, and you probably want this indexed. One
443
           solution is to have .* in skippedNames, and add things like
459
           possible solution is to have .* in skippedNames, and add things
444
           ~/.thunderbird or ~/.evolution in topdirs.
460
           like ~/.thunderbird or ~/.evolution in topdirs.
445
461
446
           Not even the file names are indexed for patterns in this list. See
462
           Not even the file names are indexed for patterns in this list. See
447
           the recoll_noindex variable in mimemap for an alternative approach
463
           the recoll_noindex variable in mimemap for an alternative approach
448
           which indexes the file names.
464
           which indexes the file names.
449
465
...
...
586
           character set definition (ie: plain text files). This can be
602
           character set definition (ie: plain text files). This can be
587
           redefined for any sub-directory. If it is not set at all, the
603
           redefined for any sub-directory. If it is not set at all, the
588
           character set used is the one defined by the nls environment
604
           character set used is the one defined by the nls environment
589
           (LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
605
           (LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
590
606
607
   unac_except_trans
608
609
           This is a list of characters, encoded in UTF-8, which should be
610
           handled specially when converting text to unaccented lowercase.
611
           For example, in Swedish, the letter a with diaeresis has full
612
           alphabet citizenship and should not be turned into an a. Each
613
           element in the space-separated list has the special character as
614
           first element and the translation following. The handling of both
615
           the lowercase and upper-case versions of a character should be
616
           specified, as appartenance to the list will turn-off both standard
617
           accent and case processing. Example for Swedish:
618
619
 unac_except_trans =  aaaa AAaa a:a: A:a: o:o: O:o:
620
            
621
622
           Note that the translation is not limited to a single character,
623
           you could very well have something like u:ue in the list.
624
625
           This parameter can't be defined for subdirectories, it is global,
626
           because there is no way to do otherwise when querying. If you have
627
           document sets which would need different values, you will have to
628
           index and query them separately.
629
591
   maildefcharset
630
   maildefcharset
592
631
593
           This can be used to define the default character set specifically
632
           This can be used to define the default character set specifically
594
           for mail messages which don't specify it. This is mainly useful
633
           for email messages which don't specify it. This is mainly useful
595
           for readpst (libpst) dumps, which are utf-8 but do not say so.
634
           for readpst (libpst) dumps, which are utf-8 but do not say so.
596
635
597
   localfields
636
   localfields
598
637
599
           This allows setting fields for all documents under a given
638
           This allows setting fields for all documents under a given
...
...
775
           used inside the [prefixes] and [stored] sections
814
           used inside the [prefixes] and [stored] sections
776
815
777
   filter-specific sections
816
   filter-specific sections
778
817
779
           Some filters may need specific configuration for handling fields.
818
           Some filters may need specific configuration for handling fields.
780
           Only the mail message filter currently has such a section (named
819
           Only the email message filter currently has such a section (named
781
           [mail]). It allows indexing arbitrary mail headers in addition to
820
           [mail]). It allows indexing arbitrary email headers in addition to
782
           the ones indexed by default. Other such sections may appear in the
821
           the ones indexed by default. Other such sections may appear in the
783
           future.
822
           future.
784
823
785
   Here follows a small example of a personal fields file. This would extract
824
   Here follows a small example of a personal fields file. This would extract
786
   a specific mail header and use it as a searchable field, with data
825
   a specific email header and use it as a searchable field, with data
787
   displayable inside result lists. (Side note: as the mail filter does no
826
   displayable inside result lists. (Side note: as the email filter does no
788
   decoding on the values, only plain ascii headers can be indexed, and only
827
   decoding on the values, only plain ascii headers can be indexed, and only
789
   the first occurrence will be used for headers that occur several times).
828
   the first occurrence will be used for headers that occur several times).
790
829
791
 [prefixes]
830
 [prefixes]
792
 # Index mailmytag contents (with the given prefix)
831
 # Index mailmytag contents (with the given prefix)