Switch to unified view

a/src/README b/src/README
...
...
100
100
101
                6.1. Writing a document filter
101
                6.1. Writing a document filter
102
102
103
                             6.1.1. Filter HTML output
103
                             6.1.1. Filter HTML output
104
104
105
                6.2. Field data processing configuration
105
                6.2. Field data processing
106
106
107
                6.3. API
107
                6.3. API
108
108
109
                             6.3.1. Interface elements
109
                             6.3.1. Interface elements
110
110
...
...
130
130
131
                7.4. Configuration overview
131
                7.4. Configuration overview
132
132
133
                             7.4.1. Main configuration file
133
                             7.4.1. Main configuration file
134
134
135
                             7.4.2. The fields file
136
135
                             7.4.2. The mimemap file
137
                             7.4.3. The mimemap file
136
138
137
                             7.4.3. The mimeconf file
139
                             7.4.4. The mimeconf file
138
140
139
                             7.4.4. The mimeview file
141
                             7.4.5. The mimeview file
140
142
141
                             7.4.5. Examples of configuration adjustments
143
                             7.4.6. Examples of configuration adjustments
142
144
143
                7.5. The KDE Kicker Recoll applet
145
                7.5. The KDE Kicker Recoll applet
144
146
145
     ----------------------------------------------------------------------
147
     ----------------------------------------------------------------------
146
148
...
...
865
867
866
     * dir for filtering the results on file location (Ex:
868
     * dir for filtering the results on file location (Ex:
867
       dir:/home/me/somedir). Please note that this is quite inefficient,
869
       dir:/home/me/somedir). Please note that this is quite inefficient,
868
       that it may produce very slow searches, and that it may be worth in
870
       that it may produce very slow searches, and that it may be worth in
869
       some cases to set up separate databases instead.
871
       some cases to set up separate databases instead.
872
873
     * date for searching or filtering on dates. The syntax for the argument
874
       is based on the ISO8601 standard for dates and time intervals. Only
875
       dates are supported, no times. The general syntax is 2 elements
876
       separated by a / character. Each element can be a date or a period of
877
       time. Periods are specified as PnYnMnD. The n numbers are the
878
       respective numbers of years, months or days, any of which may be
879
       missing. Dates are specified as YYYY-MM-DD. The days and months parts
880
       may be missing. If the / is present but an element is missing, the
881
       missing element is interpreted as the lowest or highest date in the
882
       index. Exemples:
883
884
          * 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
885
886
          * 2001-03-01/P1Y2M the same specified with a period.
887
888
          * 2001/ from the beginning of 2001 to the latest date in the index.
889
890
          * 2001 the whole year of 2001
891
892
          * P2D/ means 2 days ago up to now if there are no documents with
893
            dates in the future.
894
895
          * /2003 all documents from 2003 or older.
896
897
       Periods can also be specified with small letters (ie: p2y).
870
898
871
     * mime or format for specifying the mime type. This one is quite special
899
     * mime or format for specifying the mime type. This one is quite special
872
       because you can specify several values which will be OR'ed (the normal
900
       because you can specify several values which will be OR'ed (the normal
873
       default for the language is AND). Ex: mime:text/plain mime:text/html.
901
       default for the language is AND). Ex: mime:text/plain mime:text/html.
874
       Specifying an explicit boolean operator or negation (-) before a mime
902
       Specifying an explicit boolean operator or negation (-) before a mime
...
...
1154
   search entry field.
1182
   search entry field.
1155
1183
1156
   Wildcards. Wildcards can be used inside search terms in all forms of
1184
   Wildcards. Wildcards can be used inside search terms in all forms of
1157
   searches. More about wildcards.
1185
   searches. More about wildcards.
1158
1186
1187
   Automatic suffixes. Words like odt or ods can be automatically turned into
1188
   query language ext:xxx clauses. This can be enabled in the Search
1189
   preferences panel in the GUI.
1190
1159
   Disabling stem expansion. Entering a capitalized word in any search field
1191
   Disabling stem expansion. Entering a capitalized word in any search field
1160
   will prevent stem expansion (no search for gardening if you enter Garden
1192
   will prevent stem expansion (no search for gardening if you enter Garden
1161
   instead of garden). This is the only case where character case should make
1193
   instead of garden). This is the only case where character case should make
1162
   a difference for a Recoll search. You can also disable stem expansion or
1194
   a difference for a Recoll search. You can also disable stem expansion or
1163
   change the stemming language in the preferences.
1195
   change the stemming language in the preferences.
...
...
1319
       document abstracts when displaying the result list. Abstracts are
1351
       document abstracts when displaying the result list. Abstracts are
1320
       constructed by taking context from the document information, around
1352
       constructed by taking context from the document information, around
1321
       the search terms. This can slow down result list display significantly
1353
       the search terms. This can slow down result list display significantly
1322
       for big documents, and you may want to turn it off.
1354
       for big documents, and you may want to turn it off.
1323
1355
1324
     * Replace abstracts from documents: this decides if we should synthesize
1325
       and display an abstract in place of an explicit abstract found within
1326
       the document itself.
1327
1328
     * Synthetic abstract size: adjust to taste...
1356
     * Synthetic abstract size: adjust to taste...
1329
1357
1330
     * Synthetic abstract context words: how many words should be displayed
1358
     * Synthetic abstract context words: how many words should be displayed
1331
       around each term occurrence.
1359
       around each term occurrence.
1360
1361
     * Query language magic file name suffixes: a list of words which
1362
       automatically get turned into ext:xxx file name suffix clauses when
1363
       starting a query language query (ie: doc xls xlsx...). This will save
1364
       some typing for people who use file types a lot when querying.
1332
1365
1333
   External indexes: This panel will let you browse for additional indexes
1366
   External indexes: This panel will let you browse for additional indexes
1334
   that you may want to search. External indexes are designated by their
1367
   that you may want to search. External indexes are designated by their
1335
   database directory (ie: /home/someothergui/.recoll/xapiandb,
1368
   database directory (ie: /home/someothergui/.recoll/xapiandb,
1336
   /usr/local/recollglobal/xapiandb).
1369
   /usr/local/recollglobal/xapiandb).
...
...
1648
   See the following section for details about configuring how field data is
1681
   See the following section for details about configuring how field data is
1649
   processed by the indexer.
1682
   processed by the indexer.
1650
1683
1651
     ----------------------------------------------------------------------
1684
     ----------------------------------------------------------------------
1652
1685
1653
6.2. Field data processing configuration
1686
6.2. Field data processing
1654
1687
1655
   Fields are named pieces of information in or about documents, like title,
1688
   Fields are named pieces of information in or about documents, like title,
1656
   author, abstract.
1689
   author, abstract.
1657
1690
1658
   The field values for documents can appear in several ways during indexing:
1691
   The field values for documents can appear in several ways during indexing:
...
...
1673
1706
1674
     * stored, meaning that their value is recorded in the index data record
1707
     * stored, meaning that their value is recorded in the index data record
1675
       for the document, and can be returned and displayed with search
1708
       for the document, and can be returned and displayed with search
1676
       results.
1709
       results.
1677
1710
1678
   A field can be either or both indexed and stored.
1711
   A field can be either or both indexed and stored. This and other aspects
1712
   of fields handling is defined inside the fields configuration file.
1679
1713
1680
   A field becomes indexed by having a prefix defined in the [prefixes]
1714
   You can find more information in the section about the fields file, or in
1681
   section of the fields file. See the comments in there for details
1715
   comments inside the file.
1682
1683
   A field becomes stored by appearing in the [stored] section of the fields
1684
   file.
1685
1686
   See the comments inside the fields for more details.
1687
1716
1688
     ----------------------------------------------------------------------
1717
     ----------------------------------------------------------------------
1689
1718
1690
6.3. API
1719
6.3. API
1691
1720
...
...
2039
2068
2040
   After an indexing pass, the commands that were found missing can be
2069
   After an indexing pass, the commands that were found missing can be
2041
   displayed from the recoll File menu. The list is stored in the missing
2070
   displayed from the recoll File menu. The list is stored in the missing
2042
   text file inside the configuration directory.
2071
   text file inside the configuration directory.
2043
2072
2044
   A list of common file types which need external commands:
2073
   A list of common file types which need external commands follows. Many of
2074
   the filters need the iconv command, which is not always listed as a
2075
   dependancy.
2076
2077
   As of Recoll release 1.14, a number of XML-based formats that were handled
2078
   by ad hoc filter code now use xsltproc, which usually comes with libxslt.
2079
   These are: abiword, fb2 (ebooks), kword, openoffice, svg.
2045
2080
2046
     * Openoffice: supported natively, but needs the unzip command to be
2081
     * Openoffice: supported natively, but needs the unzip command to be
2047
       installed.
2082
       installed.
2048
2083
2049
     * PDF: pdftotext is part of the Xpdf or Poppler packages.
2084
     * PDF: pdftotext is part of the Xpdf or Poppler packages.
...
...
2051
     * Postscript: pstotext.
2086
     * Postscript: pstotext.
2052
2087
2053
     * MS Word: antiword.
2088
     * MS Word: antiword.
2054
2089
2055
     * MS Excel and PowerPoint: catdoc.
2090
     * MS Excel and PowerPoint: catdoc.
2091
2092
     * MS Open XML (docx): needs xsltproc.
2056
2093
2057
     * Wordperfect files: libwpd.
2094
     * Wordperfect files: libwpd.
2058
2095
2059
     * RTF: unrtf
2096
     * RTF: unrtf
2060
2097
...
...
2065
2102
2066
     * dvi: dvips
2103
     * dvi: dvips
2067
2104
2068
     * djvu: DjVuLibre
2105
     * djvu: DjVuLibre
2069
2106
2070
     * mp3: Recoll will use the id3info command from the id3lib package to
2107
     * mp3, flac, ogg vorbis: Recoll releases before 1.13 use the id3info
2071
       extract tag information. Without it, only the file names will be
2108
       command from the id3lib package to extract mp3 tag information. (Some
2072
       indexed.
2109
       gcc versions after 4.4 may have trouble compiling id3lib. You can find
2073
2110
       a workaround here), metaflac (standard flac tools) for flac files, and
2074
     * flac files need metaflac.
2111
       ogginfo (vorbis tools) for ogg files. Releases 1.14 and later use a
2075
2112
       single Python filter based on mutagen for all audio file types.
2076
     * ogg files need ogginfo.
2077
2113
2078
     * Pictures: Recoll uses the Exiftool Perl package to extract tag
2114
     * Pictures: Recoll uses the Exiftool Perl package to extract tag
2079
       information. Most image file formats are supported. Note that there
2115
       information. Most image file formats are supported. Note that there
2080
       may not be much interest in indexing the technical tags (image size,
2116
       may not be much interest in indexing the technical tags (image size,
2081
       aperture, etc.). This is only of interest if you store personal tags
2117
       aperture, etc.). This is only of interest if you store personal tags
2082
       or textual descriptions inside the image files.
2118
       or textual descriptions inside the image files.
2083
2119
2084
     * chm: files in microsoft help format need Python and the pychm module
2120
     * chm: files in microsoft help format need Python and the pychm module
2085
       (which needs chmlib).
2121
       (which needs chmlib).
2086
2122
2087
     * ics: iCalendar files need Python and the icalendar module.
2123
     * ics: up to Recoll 1.13, iCalendar files need Python and the icalendar
2124
       module. For newer versions, icalendar is not needed
2088
2125
2089
     * zip: Zip archives need Python (and the standard zipfile module).
2126
     * zip: Zip archives need Python (and the standard zipfile module).
2090
2127
2091
   Text, HTML, mail folders, Openoffice and Scribus files are processed
2128
   Text, HTML, mail folders, Openoffice and Scribus files are processed
2092
   internally. Lyx is used to index Lyx files. Many filters need sed and awk.
2129
   internally. Lyx is used to index Lyx files. Many filters need iconv and
2130
   the standard sed and awk.
2093
2131
2094
     ----------------------------------------------------------------------
2132
     ----------------------------------------------------------------------
2095
2133
2096
7.3. Building from source
2134
7.3. Building from source
2097
2135
2098
  7.3.1. Prerequisites
2136
  7.3.1. Prerequisites
2099
2137
2100
   At the very least, you will need to download and install the xapian core
2138
   C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
2101
   package and the qt run-time and development packages. Check the Recoll
2139
   itself by strange messages about a missing iconv_open.
2140
2141
   Development files for Xapian core
2142
2143
   Development files for Qt .
2144
2145
   Development files for X11 and zlib.
2146
2102
   download page for up to date version information.
2147
   Check the Recoll download page for up to date version information.
2103
2148
2104
   You will most probably be able to find a binary package for qt for your
2149
   You will most probably be able to find a binary package for Qt for your
2105
   system. You may have to compile Xapian but this is not difficult (if you
2150
   system. You may have to compile Xapian but this is not difficult (if you
2106
   are using FreeBSD, there is a port).
2151
   are using FreeBSD, there is a port).
2107
2152
2108
   You may also need libiconv. Recoll currently uses version 1.9 (this should
2153
   You may also need libiconv. Recoll currently uses version 1.9 (this should
2109
   not be critical). On Linux systems, the iconv interface is part of libc
2154
   not be critical). On Linux systems, the iconv interface is part of libc
...
...
2111
2156
2112
     ----------------------------------------------------------------------
2157
     ----------------------------------------------------------------------
2113
2158
2114
  7.3.2. Building
2159
  7.3.2. Building
2115
2160
2116
   Recoll has been built on Linux, FreeBSD, macosx, and Solaris, most
2161
   Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
2117
   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
2162
   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
2118
   ok). If you build on another system, and need to modify things, I would
2163
   ok). If you build on another system, and need to modify things, I would
2119
   very much welcome patches.
2164
   very much welcome patches.
2120
2165
2121
   Depending on the Qt 3 configuration on your system, you may have to set
2166
   Depending on the Qt 3 configuration on your system, you may have to set
...
...
2280
   The default configuration will index your home directory. If this is not
2325
   The default configuration will index your home directory. If this is not
2281
   appropriate, start recoll to create a blank configuration, click Cancel,
2326
   appropriate, start recoll to create a blank configuration, click Cancel,
2282
   and edit the configuration file before restarting the command. This will
2327
   and edit the configuration file before restarting the command. This will
2283
   start the initial indexing, which may take some time.
2328
   start the initial indexing, which may take some time.
2284
2329
2330
   Most of the following parameters can be changed from the Index
2331
   Configuration menu in the recoll interface. Some can only be set by
2332
   editing the configuration file.
2333
2334
     ----------------------------------------------------------------------
2335
2285
   Paramers affecting what we index:
2336
    7.4.1.1. Parameters affecting what documents we index:
2286
2337
2287
   topdirs
2338
   topdirs
2288
2339
2289
           Specifies the list of directories or files to index (recursively
2340
           Specifies the list of directories or files to index (recursively
2290
           for directories). The indexer will not follow symbolic links
2341
           for directories). You can use symbolic links as elements of this
2291
           inside the indexed trees by default (see the followLinks options
2342
           list. See the followLinks option about following symbolic links
2292
           though).
2343
           found under the top elements (not followed by default).
2293
2344
2294
   skippedNames
2345
   skippedNames
2295
2346
2296
           A space-separated list of patterns for names of files or
2347
           A space-separated list of patterns for names of files or
2297
           directories that should be completely ignored. The list defined in
2348
           directories that should be completely ignored. The list defined in
...
...
2401
2452
2402
           The path to the Beagle indexing queue. This is hard-coded in the
2453
           The path to the Beagle indexing queue. This is hard-coded in the
2403
           Beagle plugin as ~/.beagle/ToIndex so there should be no need to
2454
           Beagle plugin as ~/.beagle/ToIndex so there should be no need to
2404
           change it.
2455
           change it.
2405
2456
2406
   Parameters affecting where and how we store things:
2457
     ----------------------------------------------------------------------
2407
2458
2408
   dbdir
2459
    7.4.1.2. Parameters affecting how we generate terms:
2409
2460
2410
           The name of the Xapian data directory. It will be created if
2461
   Changing some of these parameters will imply a full reindex. Also, when
2411
           needed when the index is initialized. If this is not an absolute
2462
   using multiple indexes, it may not make sense to search indexes that don't
2412
           path, it will be interpreted relative to the configuration
2463
   share the values for these parameters, because they usually affect both
2413
           directory. The value can have embedded spaces but starting or
2464
   search and index operations.
2414
           trailing spaces will be trimmed. You cannot use quotes here.
2415
2465
2416
   maxfsoccuppc
2466
   nonumbers
2417
2467
2418
           Maximum file system occupation before we stop indexing. The value
2468
           If this set to true, no terms will be generated for numbers. For
2419
           is a percentage, corresponding to what the "Capacity" df output
2469
           example "123", "1.5e6", 192.168.1.4, would not be indexed
2420
           column shows. The default value is 0, meaning no checking.
2470
           ("value123" would still be). Numbers are often quite interesting
2471
           to search for, and this should probably not be set except for
2472
           special situations, ie, scientific documents with huge amounts of
2473
           numbers in them. This can only be set for a whole index, not for a
2474
           subtree.
2421
2475
2422
   mboxcachedir
2476
   nocjk
2423
2477
2424
           The directory where mbox message offsets cache files are held.
2478
           If this set to true, specific east asian (Chinese Korean Japanese)
2425
           This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
2479
           characters/word splitting is turned off. This will save a small
2426
           to share a directory between different configurations.
2480
           amount of cpu if you have no CJK documents. If your document base
2481
           does include such text but you are not interested in searching it,
2482
           setting nocjk may be a significant time and space saver.
2427
2483
2428
   mboxcacheminmbs
2484
   cjkngramlen
2429
2485
2430
           The minimum mbox file size over which we cache the offsets. There
2486
           This lets you adjust the size of n-grams used for indexing CJK
2431
           is really no sense in caching offsets for small files. The default
2487
           text. The default value of 2 is probably appropriate in most
2432
           is 5 MB.
2488
           cases. A value of 3 would allow more precision and efficiency on
2433
2489
           longer words, but the index will be approximately twice as large.
2434
   webcachedir
2435
2436
           This is only used by the Beagle web browser plugin indexing code,
2437
           and defines where the cache for visited pages will live. Default:
2438
           $RECOLL_CONFDIR/webcache
2439
2440
   webcachemaxmbs
2441
2442
           This is only used by the Beagle web browser plugin indexing code,
2443
           and defines the maximum size for the web page cache. Default: 40
2444
           MB.
2445
2446
   idxflushmb
2447
2448
           Threshold (megabytes of new text data) where we flush from memory
2449
           to disk index. Setting this can help control memory usage. A value
2450
           of 0 means no explicit flushing, letting Xapian use its own
2451
           default, which is flushing every 10000 documents (memory usage
2452
           depends on average document size). The default value is 10.
2453
2454
   Miscellani:
2455
2456
   loglevel,daemloglevel
2457
2458
           Verbosity level for recoll and recollindex. A value of 4 lists
2459
           quite a lot of debug/information messages. 2 only lists errors.
2460
           The daemversion is specific to the indexing monitor daemon.
2461
2462
   logfilename, daemlogfilename
2463
2464
           Where the messages should go. 'stderr' can be used as a special
2465
           value, and is the default. The daemversion is specific to the
2466
           indexing monitor daemon.
2467
2490
2468
   indexstemminglanguages
2491
   indexstemminglanguages
2469
2492
2470
           A list of languages for which the stem expansion databases will be
2493
           A list of languages for which the stem expansion databases will be
2471
           built. See recollindex(1) or use the recollindex -l command for
2494
           built. See recollindex(1) or use the recollindex -l command for
...
...
2480
           character set definition (ie: plain text files). This can be
2503
           character set definition (ie: plain text files). This can be
2481
           redefined for any sub-directory. If it is not set at all, the
2504
           redefined for any sub-directory. If it is not set at all, the
2482
           character set used is the one defined by the nls environment
2505
           character set used is the one defined by the nls environment
2483
           (LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
2506
           (LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
2484
2507
2485
   filtermaxseconds
2486
2487
           Maximum filter execution time, after which it is aborted. Some
2488
           postscript programs just loop...
2489
2490
   maildefcharset
2508
   maildefcharset
2491
2509
2492
           This can be used to define the default character set specifically
2510
           This can be used to define the default character set specifically
2493
           for mail messages which don't specify it. This is mainly useful
2511
           for mail messages which don't specify it. This is mainly useful
2494
           for readpst (libpst) dumps, which are utf-8 but do not say so.
2512
           for readpst (libpst) dumps, which are utf-8 but do not say so.
...
...
2496
   localfields
2514
   localfields
2497
2515
2498
           This allows setting fields for all documents under a given
2516
           This allows setting fields for all documents under a given
2499
           directory. Typical usage would be to set an "rclaptg" field, to be
2517
           directory. Typical usage would be to set an "rclaptg" field, to be
2500
           used in mimeview to select a specific viewer. If several fields
2518
           used in mimeview to select a specific viewer. If several fields
2501
           are to be set, they should be separated with a ':' character
2519
           are to be set, they should be separated with a colon (':')
2502
           (which there is currently no way to escape). Ie: localfields=
2520
           character (which there is currently no way to escape). Ie:
2503
           rclaptg=gnus:other = val, then select specifier viewer with
2521
           localfields= rclaptg=gnus:other = val, then select specifier
2504
           mimetype|tag=... in mimeview.
2522
           viewer with mimetype|tag=... in mimeview.
2523
2524
     ----------------------------------------------------------------------
2525
2526
    7.4.1.3. Parameters affecting where and how we store things:
2527
2528
   dbdir
2529
2530
           The name of the Xapian data directory. It will be created if
2531
           needed when the index is initialized. If this is not an absolute
2532
           path, it will be interpreted relative to the configuration
2533
           directory. The value can have embedded spaces but starting or
2534
           trailing spaces will be trimmed. You cannot use quotes here.
2535
2536
   maxfsoccuppc
2537
2538
           Maximum file system occupation before we stop indexing. The value
2539
           is a percentage, corresponding to what the "Capacity" df output
2540
           column shows. The default value is 0, meaning no checking.
2541
2542
   mboxcachedir
2543
2544
           The directory where mbox message offsets cache files are held.
2545
           This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
2546
           to share a directory between different configurations.
2547
2548
   mboxcacheminmbs
2549
2550
           The minimum mbox file size over which we cache the offsets. There
2551
           is really no sense in caching offsets for small files. The default
2552
           is 5 MB.
2553
2554
   webcachedir
2555
2556
           This is only used by the Beagle web browser plugin indexing code,
2557
           and defines where the cache for visited pages will live. Default:
2558
           $RECOLL_CONFDIR/webcache
2559
2560
   webcachemaxmbs
2561
2562
           This is only used by the Beagle web browser plugin indexing code,
2563
           and defines the maximum size for the web page cache. Default: 40
2564
           MB.
2565
2566
   idxflushmb
2567
2568
           Threshold (megabytes of new text data) where we flush from memory
2569
           to disk index. Setting this can help control memory usage. A value
2570
           of 0 means no explicit flushing, letting Xapian use its own
2571
           default, which is flushing every 10000 documents (memory usage
2572
           depends on average document size). The default value is 10.
2573
2574
     ----------------------------------------------------------------------
2575
2576
    7.4.1.4. Miscellaneous parameters:
2577
2578
   loglevel,daemloglevel
2579
2580
           Verbosity level for recoll and recollindex. A value of 4 lists
2581
           quite a lot of debug/information messages. 2 only lists errors.
2582
           The daemversion is specific to the indexing monitor daemon.
2583
2584
   logfilename, daemlogfilename
2585
2586
           Where the messages should go. 'stderr' can be used as a special
2587
           value, and is the default. The daemversion is specific to the
2588
           indexing monitor daemon.
2589
2590
   filtermaxseconds
2591
2592
           Maximum filter execution time, after which it is aborted. Some
2593
           postscript programs just loop...
2505
2594
2506
   filtersdir
2595
   filtersdir
2507
2596
2508
           A directory to search for the external filter scripts used to
2597
           A directory to search for the external filter scripts used to
2509
           index some types of files. The value should not be changed, except
2598
           index some types of files. The value should not be changed, except
...
...
2540
2629
2541
           If this is set, the aspell dictionary generation is turned off.
2630
           If this is set, the aspell dictionary generation is turned off.
2542
           Useful for cases where you don't need the functionality or when it
2631
           Useful for cases where you don't need the functionality or when it
2543
           is unusable because aspell crashes during dictionary generation.
2632
           is unusable because aspell crashes during dictionary generation.
2544
2633
2545
   nocjk
2546
2547
           If this set to true, specific east asian (Chinese Korean Japanese)
2548
           characters/word splitting is turned off. This will save a small
2549
           amount of cpu if you have no CJK documents. If your document base
2550
           does include such text but you are not interested in searching it,
2551
           setting nocjk may be a significant time and space saver.
2552
2553
   cjkngramlen
2554
2555
           This lets you adjust the size of n-grams used for indexing CJK
2556
           text. The default value of 2 is probably appropriate in most
2557
           cases. A value of 3 would allow more precision and efficiency on
2558
           longer words, but the index will be approximately twice as large.
2559
2560
   guesscharset
2634
   guesscharset
2561
2635
2562
           Decide if we try to guess the character set of files if no
2636
           Decide if we try to guess the character set of files if no
2563
           internal value is available (ie: for plain text files). This does
2637
           internal value is available (ie: for plain text files). This does
2564
           not work well in general, and should probably not be used.
2638
           not work well in general, and should probably not be used.
2565
2639
2566
     ----------------------------------------------------------------------
2640
     ----------------------------------------------------------------------
2567
2641
2642
  7.4.2. The fields file
2643
2644
   This file contains information about dynamic fields handling in Recoll.
2645
   Some very basic fields have hard-wired behaviour, and, mostly, you should
2646
   not change the original data inside the fields file. But you can create
2647
   custom fields fitting your data and handle them just like they were native
2648
   ones.
2649
2650
   The fields file has several sections, which each define an aspect of
2651
   fields processing. Quite often, you'll have to modify several sections to
2652
   obtain the desired behaviour.
2653
2654
   We will only give a short description here, you should refer to the
2655
   comments inside the file for more detailed information.
2656
2657
   Field names should be lowercase alphabetic ASCII.
2658
2659
   [prefixes]
2660
2661
           A field becomes indexed (searchable) by having a prefix defined in
2662
           this section.
2663
2664
   [stored]
2665
2666
           A field becomes stored (displayable inside results) by having its
2667
           name listed in this section (typically with an empty value).
2668
2669
   [aliases]
2670
2671
           This section defines lists of synonyms for the canonical names
2672
           used inside the [prefixes] and [stored] sections
2673
2674
   filter-specific sections
2675
2676
           Some filters may need specific configuration for handling fields.
2677
           Only the mail message filter currently has such a section (named
2678
           [mail]). It allows indexing arbitrary mail headers in addition to
2679
           the ones indexed by default. Other such sections may appear in the
2680
           future.
2681
2682
   Here follows a small example of a personal fields file. This would extract
2683
   a specific mail header and use it as a searchable field, with data
2684
   displayable inside result lists. (Side note: as the mail filter does no
2685
   decoding on the values, only plain ascii headers can be indexed, and only
2686
   the first occurrence will be used for headers that occur several times).
2687
2688
 [prefixes]
2689
 # Index mailmytag contents (with the given prefix)
2690
 mailmytag = XMTAG
2691
2692
 [stored]
2693
 # Store mailmytag inside the document data record (so that it can be
2694
 # displayed - as %(mailmytag) - in result lists).
2695
 mailmytag =
2696
2697
 [mail]
2698
 # Extract the X-My-Tag mail header, and use it internally with the
2699
 # mailmytag field name
2700
 x-my-tag = mailmytag
2701
2702
     ----------------------------------------------------------------------
2703
2568
  7.4.2. The mimemap file
2704
  7.4.3. The mimemap file
2569
2705
2570
   mimemap specifies the file name extension to mime type mappings.
2706
   mimemap specifies the file name extension to mime type mappings.
2571
2707
2572
   For file names without an extension, or with an unknown one, the system's
2708
   For file names without an extension, or with an unknown one, the system's
2573
   file -i command will be executed to determine the mime type (this can be
2709
   file -i command will be executed to determine the mime type (this can be
...
...
2589
   given Recoll version. Having it there avoids cluttering the more
2725
   given Recoll version. Having it there avoids cluttering the more
2590
   user-oriented and locally customized skippedNames.
2726
   user-oriented and locally customized skippedNames.
2591
2727
2592
     ----------------------------------------------------------------------
2728
     ----------------------------------------------------------------------
2593
2729
2594
  7.4.3. The mimeconf file
2730
  7.4.4. The mimeconf file
2595
2731
2596
   mimeconf specifies how the different mime types are handled for indexing,
2732
   mimeconf specifies how the different mime types are handled for indexing,
2597
   and which icons are displayed in the recoll result lists.
2733
   and which icons are displayed in the recoll result lists.
2598
2734
2599
   Changing the parameters in the [index] section is probably not a good idea
2735
   Changing the parameters in the [index] section is probably not a good idea
...
...
2603
   recoll in the result lists (the values are the basenames of the png images
2739
   recoll in the result lists (the values are the basenames of the png images
2604
   inside the iconsdir directory (specified in recoll.conf).
2740
   inside the iconsdir directory (specified in recoll.conf).
2605
2741
2606
     ----------------------------------------------------------------------
2742
     ----------------------------------------------------------------------
2607
2743
2608
  7.4.4. The mimeview file
2744
  7.4.5. The mimeview file
2609
2745
2610
   mimeview specifies which programs are started when you click on an Edit
2746
   mimeview specifies which programs are started when you click on an Edit
2611
   link in a result list. Ie: HTML is normally displayed using firefox, but
2747
   link in a result list. Ie: HTML is normally displayed using firefox, but
2612
   you may prefer Konqueror, your openoffice.org program might be named
2748
   you may prefer Konqueror, your openoffice.org program might be named
2613
   oofice instead of openoffice etc.
2749
   oofice instead of openoffice etc.
...
...
2631
   user preferences, all mimeview entries will be ignored except the one
2767
   user preferences, all mimeview entries will be ignored except the one
2632
   labelled application/x-all (which is set to use xdg-open by default).
2768
   labelled application/x-all (which is set to use xdg-open by default).
2633
2769
2634
     ----------------------------------------------------------------------
2770
     ----------------------------------------------------------------------
2635
2771
2636
  7.4.5. Examples of configuration adjustments
2772
  7.4.6. Examples of configuration adjustments
2637
2773
2638
    7.4.5.1. Adding an external viewer for an non-indexed type
2774
    7.4.6.1. Adding an external viewer for an non-indexed type
2639
2775
2640
   Imagine that you have some kind of file which does not have indexable
2776
   Imagine that you have some kind of file which does not have indexable
2641
   content, but for which you would like to have a functional Edit link in
2777
   content, but for which you would like to have a functional Edit link in
2642
   the result list (when found by file name). The file names end in .blob and
2778
   the result list (when found by file name). The file names end in .blob and
2643
   can be displayed by application blobviewer.
2779
   can be displayed by application blobviewer.
...
...
2665
   configuration, which you do not need to alter. mimeview can also be
2801
   configuration, which you do not need to alter. mimeview can also be
2666
   modified from the Gui.
2802
   modified from the Gui.
2667
2803
2668
     ----------------------------------------------------------------------
2804
     ----------------------------------------------------------------------
2669
2805
2670
    7.4.5.2. Adding indexing support for a new file type
2806
    7.4.6.2. Adding indexing support for a new file type
2671
2807
2672
   Let us now imagine that the above .blob files actually contain indexable
2808
   Let us now imagine that the above .blob files actually contain indexable
2673
   text and that you know how to extract it with a command line program.
2809
   text and that you know how to extract it with a command line program.
2674
   Getting Recoll to index the files is easy. You need to perform the above
2810
   Getting Recoll to index the files is easy. You need to perform the above
2675
   alteration, and also to add data to the mimeconf file (typically in
2811
   alteration, and also to add data to the mimeconf file (typically in