Switch to unified view

a/src/README b/src/README
...
...
32
32
33
                2.1. Introduction
33
                2.1. Introduction
34
34
35
                2.2. Index storage
35
                2.2. Index storage
36
36
37
                             2.2.1. Index formats
37
                             2.2.1. Xapian index formats
38
38
39
                             2.2.2. Security aspects
39
                             2.2.2. Security aspects
40
40
41
                2.3. The indexing configuration
41
                2.3. Indexing configuration
42
43
                             2.3.1. The indexing configuration GUI
42
44
43
                2.4. Periodic indexing
45
                2.4. Periodic indexing
44
46
45
                             2.4.1. Starting indexing
47
                             2.4.1. Starting indexing
46
48
...
...
104
106
105
                             4.4.4. The mimeview file
107
                             4.4.4. The mimeview file
106
108
107
                             4.4.5. Examples of configuration adjustments
109
                             4.4.5. Examples of configuration adjustments
108
110
111
                4.5. The KDE Kicker Recoll applet
112
109
                4.5. Extending Recoll
113
                4.6. Extending Recoll
110
114
111
                             4.5.1. Writing a document filter
115
                             4.6.1. Writing a document filter
112
116
113
     ----------------------------------------------------------------------
117
     ----------------------------------------------------------------------
114
118
115
                            Chapter 1. Introduction
119
                            Chapter 1. Introduction
116
120
...
...
313
   The index data directory (xapiandb) only contains data that can be
317
   The index data directory (xapiandb) only contains data that can be
314
   completely rebuilt by an index run, and it can always be destroyed safely.
318
   completely rebuilt by an index run, and it can always be destroyed safely.
315
319
316
     ----------------------------------------------------------------------
320
     ----------------------------------------------------------------------
317
321
318
  2.2.1. Index formats
322
  2.2.1. Xapian index formats
323
324
   If your first installation of Recoll was 1.9.0 or more recent, you can
325
   skip this section.
319
326
320
   Xapian has had two possible index formats for quite some time. The "old"
327
   Xapian has had two possible index formats for quite some time. The "old"
321
   one named Quartz, and the new one named Flint. Xapian 0.9 used Quartz by
328
   one named Quartz, and the new one named Flint. Xapian 0.9 used Quartz by
322
   default, but could use Flint if a specific environment variable
329
   default, but could use Flint if a specific environment variable
323
   (XAPIAN_PREFER_FLINT) was set. Xapian 1.0 still supports Quartz but will
330
   (XAPIAN_PREFER_FLINT) was set. Xapian 1.0 still supports Quartz but will
...
...
352
   mode of 0700 (access by owner only). As the index data directory is by
359
   mode of 0700 (access by owner only). As the index data directory is by
353
   default a sub-directory of the configuration directory, this should result
360
   default a sub-directory of the configuration directory, this should result
354
   in appropriate protection.
361
   in appropriate protection.
355
362
356
   If you use another setup, you should think of the kind of protection you
363
   If you use another setup, you should think of the kind of protection you
357
   need for your index, and set the directory and files access modes
364
   need for your index, set the directory and files access modes
358
   appropriately.
365
   appropriately, and also maybe adjust the umask used during index updates.
359
366
360
     ----------------------------------------------------------------------
367
     ----------------------------------------------------------------------
361
368
362
2.3. The indexing configuration
369
2.3. Indexing configuration
363
370
364
   You can control which areas of the file system are indexed, and how files
371
   Variables set inside the Recoll configuration files control which areas of
365
   are processed, by setting variables inside the Recoll configuration files.
372
   the file system are indexed, and how files are processed. These variables
373
   can be set either by editing the text files or using the dialogs in the
374
   recoll GUI.
366
375
367
   You can also use multiple indexes defined by separate configurations,
376
   You can also use multiple indexes defined by separate configurations,
368
   typically to separate personal and shared indexes, or to take advantage of
377
   typically to separate personal and shared indexes, or to take advantage of
369
   the organization of your data to improve search precision.
378
   the organization of your data to improve search precision.
370
379
...
...
381
   topdirs, which determines what subtrees get indexed.
390
   topdirs, which determines what subtrees get indexed.
382
391
383
   The applications needed to index file types other than text, HTML or email
392
   The applications needed to index file types other than text, HTML or email
384
   (ie: pdf, postscript, ms-word...) are described in the external packages
393
   (ie: pdf, postscript, ms-word...) are described in the external packages
385
   section
394
   section
395
396
     ----------------------------------------------------------------------
397
398
  2.3.1. The indexing configuration GUI
399
400
   As of Recoll 1.10, most parameters for a given indexing configuration can
401
   be set from a recoll GUI running on this configuration (either as default,
402
   or by setting RECOLL_CONFDIR or the -c option.)
403
404
   The interface is started from the Preferences menu. It has two main
405
   panels. The first panel allows setting global variables, like the list of
406
   top directories or the list of skipped paths. The second panel allows
407
   setting variables that can be redefined for subdirectories. This second
408
   panel has an initially empty list of customisation directories, to which
409
   you can add. The variables are then set for the currently selected
410
   directory (or at the top level if the empty line is selected).
411
412
   The meaning for most entries in the interface is self-evident and
413
   documented by a ToolTip popup on the text label. For more detail, you will
414
   need to refer to the configuration section of this guide.
415
416
   The configuration tool normally respects the comments and most of the
417
   formatting inside the configuration file, so that it is quite possible to
418
   use it on hand-edited files, which you might nevertheless want to backup
419
   first...
386
420
387
     ----------------------------------------------------------------------
421
     ----------------------------------------------------------------------
388
422
389
2.4. Periodic indexing
423
2.4. Periodic indexing
390
424
...
...
715
749
716
   There are two other elements which may be specified through the field
750
   There are two other elements which may be specified through the field
717
   syntax, but are somewhat special:
751
   syntax, but are somewhat special:
718
752
719
     * ext for specifying the file name extension (Ex: ext:html)
753
     * ext for specifying the file name extension (Ex: ext:html)
754
755
     * dir for specifying the file location (Ex: dir:/home/me/somedir).
756
       Please note that this is quite inefficient, that it may produce very
757
       slow searches, and that it may be worth in some cases to set up
758
       separate databases instead.
720
759
721
     * mime for specifying the mime type. This one is quite special because
760
     * mime for specifying the mime type. This one is quite special because
722
       you can specify several values which will be OR'ed (the normal default
761
       you can specify several values which will be OR'ed (the normal default
723
       for the language is AND). Ex: mime:text/plain mime:text/html.
762
       for the language is AND). Ex: mime:text/plain mime:text/html.
724
       Specifying an explicit boolean operator or negation (-) before a mime
763
       Specifying an explicit boolean operator or negation (-) before a mime
...
...
1201
1240
1202
     * Wordperfect files: libwpd.
1241
     * Wordperfect files: libwpd.
1203
1242
1204
     * RTF: unrtf
1243
     * RTF: unrtf
1205
1244
1245
     * TeX: Recoll uses the untex program. Your distribution may have a
1246
       package for it. If it doesn't, there is a copy of the source on the
1247
       Recoll web site, because the program has no obvious home. The filter
1248
       can also work with detex and will use it if it is installed.
1249
1206
     * dvi: dvips
1250
     * dvi: dvips
1207
1251
1208
     * djvu: DjVuLibre
1252
     * djvu: DjVuLibre
1209
1253
1210
     * MP3: Recoll will use the id3info command from the id3lib package to
1254
     * MP3: Recoll will use the id3info command from the id3lib package to
...
...
1497
           Decide if we use the file -i system command as a final step for
1541
           Decide if we use the file -i system command as a final step for
1498
           determining the mime type for a file (the main procedure uses
1542
           determining the mime type for a file (the main procedure uses
1499
           suffix associations as defined in the mimemap file). This can be
1543
           suffix associations as defined in the mimemap file). This can be
1500
           useful for files with suffix-less names, but it will also cause
1544
           useful for files with suffix-less names, but it will also cause
1501
           the indexing of many bogus "text" files.
1545
           the indexing of many bogus "text" files.
1546
1547
   indexedmimetypes
1548
1549
           Recoll normally indexes any file which it knows how to read. This
1550
           list lets you restrict the indexed mime types to what you specify.
1551
           If the variable is unspecified or the list empty (the default),
1552
           all supported types are processed.
1502
1553
1503
   indexallfilenames
1554
   indexallfilenames
1504
1555
1505
           Recoll indexes file names in a special section of the database to
1556
           Recoll indexes file names in a special section of the database to
1506
           allow specific file names searches using wild cards. This
1557
           allow specific file names searches using wild cards. This
...
...
1534
1585
1535
           If this is set, the aspell dictionary generation is turned off.
1586
           If this is set, the aspell dictionary generation is turned off.
1536
           Useful for cases where you don't need the functionality or when it
1587
           Useful for cases where you don't need the functionality or when it
1537
           is unusable because aspell crashes during dictionary generation.
1588
           is unusable because aspell crashes during dictionary generation.
1538
1589
1590
   nocjk
1591
1592
           If this set to true, specific east asian (Chinese Korean Japanese)
1593
           characters/word splitting is turned off. This will save a small
1594
           amount of cpu if you have no CJK documents. If your document base
1595
           does include such text but you are not interested in searching it,
1596
           setting nocjk may be a significant time and space saver.
1597
1598
   cjkngramlen
1599
1600
           This lets you adjust the size of n-grams used for indexing CJK
1601
           text. The default value of 2 is probably appropriate in most
1602
           cases. A value of 3 would allow more precision and efficiency on
1603
           longer words, but the index will be approximately twice as large.
1604
1539
     ----------------------------------------------------------------------
1605
     ----------------------------------------------------------------------
1540
1606
1541
  4.4.2. The mimemap file
1607
  4.4.2. The mimemap file
1542
1608
1543
   mimemap specifies the file name extension to mime type mappings.
1609
   mimemap specifies the file name extension to mime type mappings.
...
...
1666
   You can find more details about writing a Recoll filter in the section
1732
   You can find more details about writing a Recoll filter in the section
1667
   about writing filters
1733
   about writing filters
1668
1734
1669
     ----------------------------------------------------------------------
1735
     ----------------------------------------------------------------------
1670
1736
1737
4.5. The KDE Kicker Recoll applet
1738
1739
   The Recoll source tree contains the source code to the recoll_applet, a
1740
   small application derived from the find_applet. This can be used to add a
1741
   small Recoll launcher to the KDE panel.
1742
1743
   The applet is not automatically built with the main Recoll programs. To
1744
   build it, you need to unpack the Recoll source code, then go to the
1745
   kde/recoll_applet/ directory, and type the usual configure;make;make
1746
   install.
1747
1748
   You can then add the applet to the panel by right-clicking the panel and
1749
   choosing the Add applet entry.
1750
1751
   The recoll_applet has a small text window where you can type a Recoll
1752
   query (in query language form), and an icon which can be used to restrict
1753
   the search to certain types of files.
1754
1755
     ----------------------------------------------------------------------
1756
1671
4.5. Extending Recoll
1757
4.6. Extending Recoll
1672
1758
1673
  4.5.1. Writing a document filter
1759
  4.6.1. Writing a document filter
1674
1760
1675
   Recoll filters are executable programs which translate from a specific
1761
   Recoll filters are executable programs which translate from a specific
1676
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
1762
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
1677
   format, which was chosen to be HTML.
1763
   format, which was chosen to be HTML.
1678
1764