Switch to unified view

a/src/README b/src/README
...
...
12
12
13
   This document introduces full text search notions and describes the
13
   This document introduces full text search notions and describes the
14
   installation and use of the Recoll application. It currently describes
14
   installation and use of the Recoll application. It currently describes
15
   Recoll 1.18.
15
   Recoll 1.18.
16
16
17
   [ Split HTML / Single HTML ]
18
19
     ----------------------------------------------------------------------
17
     ----------------------------------------------------------------------
20
18
21
   Table of Contents
19
   Table of Contents
22
20
23
   1. Introduction
21
   1. Introduction
...
...
52
50
53
                             2.3.2. Index case and diacritics sensitivity
51
                             2.3.2. Index case and diacritics sensitivity
54
52
55
                             2.3.3. The index configuration GUI
53
                             2.3.3. The index configuration GUI
56
54
57
                2.4. Using Beagle WEB browser plugins
55
                2.4. Index WEB visited page history
58
56
59
                2.5. Periodic indexing
57
                2.5. Periodic indexing
60
58
61
                             2.5.1. Running indexing
59
                             2.5.1. Running indexing
62
60
...
...
75
73
76
                             3.1.2. The default result list
74
                             3.1.2. The default result list
77
75
78
                             3.1.3. The result table
76
                             3.1.3. The result table
79
77
78
                             3.1.4. Displaying thumbnails
79
80
                             3.1.4. The preview window
80
                             3.1.5. The preview window
81
81
82
                             3.1.5. Complex/advanced search
82
                             3.1.6. Complex/advanced search
83
83
84
                             3.1.6. The term explorer tool
84
                             3.1.7. The term explorer tool
85
85
86
                             3.1.7. Multiple indexes
86
                             3.1.8. Multiple indexes
87
87
88
                             3.1.8. Document history
88
                             3.1.9. Document history
89
89
90
                             3.1.9. Sorting search results and collapsing
90
                             3.1.10. Sorting search results and collapsing
91
                             duplicates
91
                             duplicates
92
92
93
                             3.1.10. Search tips, shortcuts
93
                             3.1.11. Search tips, shortcuts
94
94
95
                             3.1.11. Customizing the search interface
95
                             3.1.12. Customizing the search interface
96
96
97
                3.2. Searching with the KDE KIO slave
97
                3.2. Searching with the KDE KIO slave
98
98
99
                             3.2.1. What's this
99
                             3.2.1. What's this
100
100
...
...
124
124
125
                4.1. Writing a document filter
125
                4.1. Writing a document filter
126
126
127
                             4.1.1. Simple filters
127
                             4.1.1. Simple filters
128
128
129
                             4.1.2. "Multiple" filters
130
129
                             4.1.2. Telling Recoll about the filter
131
                             4.1.3. Telling Recoll about the filter
130
132
131
                             4.1.3. Filter HTML output
133
                             4.1.4. Filter HTML output
132
134
133
                             4.1.4. Page numbers
135
                             4.1.5. Page numbers
134
136
135
                4.2. Field data processing
137
                4.2. Field data processing
136
138
137
                4.3. API
139
                4.3. API
138
140
...
...
170
172
171
                             5.4.5. The mimeview file
173
                             5.4.5. The mimeview file
172
174
173
                             5.4.6. Examples of configuration adjustments
175
                             5.4.6. Examples of configuration adjustments
174
176
175
     ----------------------------------------------------------------------
177
Chapter 1. Introduction
176
177
                            Chapter 1. Introduction
178
178
179
1.1. Giving it a try
179
1.1. Giving it a try
180
180
181
   If you do not like reading manuals (who does?) and would like to give
181
   If you do not like reading manuals (who does?) and would like to give
182
   Recoll a try, just install the application and start the recoll graphical
182
   Recoll a try, just install the application and start the recoll graphical
...
...
189
   area.
189
   area.
190
190
191
   Also be aware that you may need to install the appropriate supporting
191
   Also be aware that you may need to install the appropriate supporting
192
   applications for document types that need them (for example antiword for
192
   applications for document types that need them (for example antiword for
193
   Microsoft Word files).
193
   Microsoft Word files).
194
195
     ----------------------------------------------------------------------
196
194
197
1.2. Full text search
195
1.2. Full text search
198
196
199
   Recoll is a full text search application. Full text search applications
197
   Recoll is a full text search application. Full text search applications
200
   let you find your data by content rather than by external attributes (like
198
   let you find your data by content rather than by external attributes (like
...
...
225
223
226
   Stemming, by itself, does not accommodate for misspellings or phonetic
224
   Stemming, by itself, does not accommodate for misspellings or phonetic
227
   searches. Recoll supports these features through a specific tool (the term
225
   searches. Recoll supports these features through a specific tool (the term
228
   explorer) which will let you explore the set of index terms along
226
   explorer) which will let you explore the set of index terms along
229
   different modes.
227
   different modes.
230
231
     ----------------------------------------------------------------------
232
228
233
1.3. Recoll overview
229
1.3. Recoll overview
234
230
235
   Recoll uses the Xapian information retrieval library as its storage and
231
   Recoll uses the Xapian information retrieval library as its storage and
236
   retrieval engine. Xapian is a very mature package using a sophisticated
232
   retrieval engine. Xapian is a very mature package using a sophisticated
...
...
309
   options to help you find what you are looking for. However, there are
305
   options to help you find what you are looking for. However, there are
310
   other ways to perform Recoll searches: mostly a command line interface, a
306
   other ways to perform Recoll searches: mostly a command line interface, a
311
   Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
307
   Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
312
   Lens module.
308
   Lens module.
313
309
314
     ----------------------------------------------------------------------
310
Chapter 2. Indexing
315
316
                              Chapter 2. Indexing
317
311
318
2.1. Introduction
312
2.1. Introduction
319
313
320
   Indexing is the process by which the set of documents is analyzed and the
314
   Indexing is the process by which the set of documents is analyzed and the
321
   data entered into the database. Recoll indexing is normally incremental:
315
   data entered into the database. Recoll indexing is normally incremental:
...
...
325
   -z or -Z).
319
   -z or -Z).
326
320
327
   The following sections give an overview of different aspects of the
321
   The following sections give an overview of different aspects of the
328
   indexing processes and configuration, with links to detailed sections.
322
   indexing processes and configuration, with links to detailed sections.
329
323
330
     ----------------------------------------------------------------------
331
332
  2.1.1. Indexing modes
324
  2.1.1. Indexing modes
333
325
334
   Recoll indexing can be performed along two different modes:
326
   Recoll indexing can be performed along two different modes:
335
327
336
     * Periodic (or batch) indexing: indexing takes place at discrete times,
328
     o Periodic (or batch) indexing: indexing takes place at discrete times,
337
       by executing the recollindex command. The typical usage is to have a
329
       by executing the recollindex command. The typical usage is to have a
338
       nightly indexing run programmed into your cron file.
330
       nightly indexing run programmed into your cron file.
339
331
340
     * Real time indexing: indexing takes place as soon as a file is created
332
     o Real time indexing: indexing takes place as soon as a file is created
341
       or changed. recollindex runs as a daemon and uses a file system
333
       or changed. recollindex runs as a daemon and uses a file system
342
       alteration monitor such as inotify, Fam or Gamin to detect file
334
       alteration monitor such as inotify, Fam or Gamin to detect file
343
       changes.
335
       changes.
344
336
345
   The choice between the two methods is mostly a matter of preference, and
337
   The choice between the two methods is mostly a matter of preference, and
...
...
347
   indexing on a big documentation directory, and real time indexing on a
339
   indexing on a big documentation directory, and real time indexing on a
348
   small home directory). Monitoring a big file system tree can consume
340
   small home directory). Monitoring a big file system tree can consume
349
   significant system resources.
341
   significant system resources.
350
342
351
   The choice of method and the parameters used can be configured from the
343
   The choice of method and the parameters used can be configured from the
352
   recoll GUI: Preferences->Indexing schedule
344
   recoll GUI: Preferences -> Indexing schedule
353
354
     ----------------------------------------------------------------------
355
345
356
  2.1.2. Configurations, multiple indexes
346
  2.1.2. Configurations, multiple indexes
357
347
358
   The parameters describing what is to be indexed and local preferences are
348
   The parameters describing what is to be indexed and local preferences are
359
   defined in text files contained in a configuration directory.
349
   defined in text files contained in a configuration directory.
...
...
380
370
381
   For index generation, multiple configurations are totally independant from
371
   For index generation, multiple configurations are totally independant from
382
   each other. When multiple indexes need to be used for a single search,
372
   each other. When multiple indexes need to be used for a single search,
383
   some parameters should be consistent among the configurations.
373
   some parameters should be consistent among the configurations.
384
374
385
     ----------------------------------------------------------------------
386
387
  2.1.3. Document types
375
  2.1.3. Document types
388
376
389
   Recoll knows about quite a few different document types. The parameters
377
   Recoll knows about quite a few different document types. The parameters
390
   for document types recognition and processing are set in configuration
378
   for document types recognition and processing are set in configuration
391
   files.
379
   files.
...
...
402
390
403
   Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
391
   Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
404
   applications for preprocessing. The list is in the installation section.
392
   applications for preprocessing. The list is in the installation section.
405
   After every indexing operation, Recoll updates a list of commands that
393
   After every indexing operation, Recoll updates a list of commands that
406
   would be needed for indexing existing files types. This list can be
394
   would be needed for indexing existing files types. This list can be
407
   displayed by selecting the menu option File->Show Missing Helpers in the
395
   displayed by selecting the menu option File -> Show Missing Helpers in the
408
   recoll GUI. It is stored in the missing text file inside the configuration
396
   recoll GUI. It is stored in the missing text file inside the configuration
409
   directory.
397
   directory.
410
411
     ----------------------------------------------------------------------
412
398
413
  2.1.4. Recovery
399
  2.1.4. Recovery
414
400
415
   In the rare case where the index becomes corrupted (which can signal
401
   In the rare case where the index becomes corrupted (which can signal
416
   itself by weird search results or crashes), the index files need to be
402
   itself by weird search results or crashes), the index files need to be
417
   erased before restarting a clean indexing pass. Just delete the xapiandb
403
   erased before restarting a clean indexing pass. Just delete the xapiandb
418
   directory (see next section), or, alternatively, start the next
404
   directory (see next section), or, alternatively, start the next
419
   recollindex with the -z option, which will reset the database before
405
   recollindex with the -z option, which will reset the database before
420
   indexing.
406
   indexing.
421
407
422
     ----------------------------------------------------------------------
423
424
2.2. Index storage
408
2.2. Index storage
425
409
426
   The default location for the index data is the xapiandb subdirectory of
410
   The default location for the index data is the xapiandb subdirectory of
427
   the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
411
   the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
428
   This can be changed via two different methods (with different purposes):
412
   This can be changed via two different methods (with different purposes):
429
413
430
     * You can specify a different configuration directory by setting the
414
     o You can specify a different configuration directory by setting the
431
       RECOLL_CONFDIR environment variable, or using the -c option to the
415
       RECOLL_CONFDIR environment variable, or using the -c option to the
432
       Recoll commands. This method would typically be used to index
416
       Recoll commands. This method would typically be used to index
433
       different areas of the file system to different indexes. For example,
417
       different areas of the file system to different indexes. For example,
434
       if you were to issue the following commands:
418
       if you were to issue the following commands:
435
419
...
...
443
427
444
       Using multiple configuration directories and configuration options
428
       Using multiple configuration directories and configuration options
445
       allows you to tailor multiple configurations and indexes to handle
429
       allows you to tailor multiple configurations and indexes to handle
446
       whatever subset of the available data you wish to make searchable.
430
       whatever subset of the available data you wish to make searchable.
447
431
448
     * For a given configuration directory, you can specify a non-default
432
     o For a given configuration directory, you can specify a non-default
449
       storage location for the index by setting the dbdir parameter in the
433
       storage location for the index by setting the dbdir parameter in the
450
       configuration file (see the configuration section). This method would
434
       configuration file (see the configuration section). This method would
451
       mainly be of use if you wanted to keep the configuration directory in
435
       mainly be of use if you wanted to keep the configuration directory in
452
       its default location, but desired another location for the index,
436
       its default location, but desired another location for the index,
453
       typically out of disk occupation concerns.
437
       typically out of disk occupation concerns.
...
...
466
450
467
   The index data directory (xapiandb) only contains data that can be
451
   The index data directory (xapiandb) only contains data that can be
468
   completely rebuilt by an index run (as long as the original documents
452
   completely rebuilt by an index run (as long as the original documents
469
   exist), and it can always be destroyed safely.
453
   exist), and it can always be destroyed safely.
470
454
471
     ----------------------------------------------------------------------
472
473
  2.2.1. Xapian index formats
455
  2.2.1. Xapian index formats
474
456
475
   Xapian versions usually support several formats for index storage. A given
457
   Xapian versions usually support several formats for index storage. A given
476
   major Xapian version will have a current format, used to create new
458
   major Xapian version will have a current format, used to create new
477
   indexes, and will also support the format from the previous major version.
459
   indexes, and will also support the format from the previous major version.
...
...
484
466
485
   Using the -z option to recollindex is not sufficient to change the format,
467
   Using the -z option to recollindex is not sufficient to change the format,
486
   you will have to delete all files inside the index directory (typically
468
   you will have to delete all files inside the index directory (typically
487
   ~/.recoll/xapiandb) before starting the indexing.
469
   ~/.recoll/xapiandb) before starting the indexing.
488
470
489
     ----------------------------------------------------------------------
490
491
  2.2.2. Security aspects
471
  2.2.2. Security aspects
492
472
493
   The Recoll index does not hold copies of the indexed documents. But it
473
   The Recoll index does not hold copies of the indexed documents. But it
494
   does hold enough data to allow for an almost complete reconstruction. If
474
   does hold enough data to allow for an almost complete reconstruction. If
495
   confidential data is indexed, access to the database directory should be
475
   confidential data is indexed, access to the database directory should be
...
...
501
   in appropriate protection.
481
   in appropriate protection.
502
482
503
   If you use another setup, you should think of the kind of protection you
483
   If you use another setup, you should think of the kind of protection you
504
   need for your index, set the directory and files access modes
484
   need for your index, set the directory and files access modes
505
   appropriately, and also maybe adjust the umask used during index updates.
485
   appropriately, and also maybe adjust the umask used during index updates.
506
507
     ----------------------------------------------------------------------
508
486
509
2.3. Index configuration
487
2.3. Index configuration
510
488
511
   Variables set inside the Recoll configuration files control which areas of
489
   Variables set inside the Recoll configuration files control which areas of
512
   the file system are indexed, and how files are processed. These variables
490
   the file system are indexed, and how files are processed. These variables
...
...
531
   section.
509
   section.
532
510
533
   As of Recoll 1.18 there are two incompatible types of Recoll indexes,
511
   As of Recoll 1.18 there are two incompatible types of Recoll indexes,
534
   depending on the treatment of character case and diacritics. The next
512
   depending on the treatment of character case and diacritics. The next
535
   section describes the two types in more detail.
513
   section describes the two types in more detail.
536
537
     ----------------------------------------------------------------------
538
514
539
  2.3.1. Multiple indexes
515
  2.3.1. Multiple indexes
540
516
541
   Multiple Recoll indexes can be created by using several configuration
517
   Multiple Recoll indexes can be created by using several configuration
542
   directories which are usually set to index different areas of the file
518
   directories which are usually set to index different areas of the file
...
...
573
   Most importantly, all indexes to be queried concurrently must have the
549
   Most importantly, all indexes to be queried concurrently must have the
574
   same option concerning character case and diacritics stripping, but there
550
   same option concerning character case and diacritics stripping, but there
575
   are other constraints. Most of the relevant parameters are described in
551
   are other constraints. Most of the relevant parameters are described in
576
   the linked section.
552
   the linked section.
577
553
578
     ----------------------------------------------------------------------
579
580
  2.3.2. Index case and diacritics sensitivity
554
  2.3.2. Index case and diacritics sensitivity
581
555
582
   As of Recoll version 1.18 you have a choice of building an index with
556
   As of Recoll version 1.18 you have a choice of building an index with
583
   terms stripped of character case and diacritics, or one with raw terms.
557
   terms stripped of character case and diacritics, or one with raw terms.
584
   For a source term of Resume, the former will store resume, the latter
558
   For a source term of Resume, the former will store resume, the latter
...
...
606
   As a cost for added capability, a raw index will be slightly bigger than a
580
   As a cost for added capability, a raw index will be slightly bigger than a
607
   stripped one (around 10%). Also, searches will be more complex, so
581
   stripped one (around 10%). Also, searches will be more complex, so
608
   probably slightly slower, and the feature is still young, so that a
582
   probably slightly slower, and the feature is still young, so that a
609
   certain amount of weirdness cannot be excluded.
583
   certain amount of weirdness cannot be excluded.
610
584
611
     ----------------------------------------------------------------------
612
613
  2.3.3. The index configuration GUI
585
  2.3.3. The index configuration GUI
614
586
615
   Most parameters for a given index configuration can be set from a recoll
587
   Most parameters for a given index configuration can be set from a recoll
616
   GUI running on this configuration (either as default, or by setting
588
   GUI running on this configuration (either as default, or by setting
617
   RECOLL_CONFDIR or the -c option.)
589
   RECOLL_CONFDIR or the -c option.)
618
590
619
   The interface is started from the Preferences->Index Configuration menu
591
   The interface is started from the Preferences -> Index Configuration menu
620
   entry. It is divided in four tabs, Global parameters, Local parameters,
592
   entry. It is divided in four tabs, Global parameters, Local parameters,
621
   Beagle web history (which is explained in the next section) and Search
593
   Web history (which is explained in the next section) and Search
622
   parameters.
594
   parameters.
623
595
624
   The Global parameters tab allows setting global variables, like the lists
596
   The Global parameters tab allows setting global variables, like the lists
625
   of top directories, skipped paths, or stemming languages.
597
   of top directories, skipped paths, or stemming languages.
626
598
...
...
641
   The configuration tool normally respects the comments and most of the
613
   The configuration tool normally respects the comments and most of the
642
   formatting inside the configuration file, so that it is quite possible to
614
   formatting inside the configuration file, so that it is quite possible to
643
   use it on hand-edited files, which you might nevertheless want to backup
615
   use it on hand-edited files, which you might nevertheless want to backup
644
   first...
616
   first...
645
617
646
     ----------------------------------------------------------------------
618
2.4. Index WEB visited page history
647
619
648
2.4. Using Beagle WEB browser plugins
620
   With the help of a Firefox extension, Recoll can index the Internet pages
621
   that you visit. The extension was initially designed for the Beagle
622
   indexer, but it has recently be renamed and better adapted to Recoll.
649
623
650
   Beagle is (was?) a concurrent desktop indexer, built on Lucene and the
651
   Mono project (C#), for which a number of add-on browser plugins were
652
   written. These work by copying visited web pages to an indexing queue
624
   The extension works by copying visited WEB pages to an indexing queue
653
   directory, which the indexer then processes. Especially, there is a
625
   directory, which Recoll then processes, indexing the data, storing it into
654
   Firefox extension.
626
   a local cache, then removing the file from the queue.
655
656
   If, for any reason, you so happen to prefer Recoll to Beagle, you can
657
   still use the Firefox plugin, which is written in Javascript and
658
   completely independant of C#, Beagle, Lucene..., and set Recoll to process
659
   the Beagle queue directory. This supposes that Beagle is not running, else
660
   both programs will fight for the same files.
661
627
662
   This feature can be enabled in the GUI Index configuration panel, or by
628
   This feature can be enabled in the GUI Index configuration panel, or by
663
   editing the configuration file (set processbeaglequeue to 1).
629
   editing the configuration file (set processwebqueue to 1).
664
630
665
   There are more recent instructions about how to find and install the
631
   A current pointer to the extension can be found, along with up-to-date
666
   Firefox extension on the Recoll wiki.
632
   instructions, on the Recoll wiki.
667
633
668
   Unfortunately, it seems that the plugin does not work anymore with recent
634
   A copy of the indexed WEB pages is retained by Recoll in a local cache
669
   Firefox versions (tried with 10.0). This is not the trival installation
635
   (from which previews can be fetched). The cache size can be adjusted from
670
   version check issue, explicit manual indexing requests still work, but
636
   the Index configuration / Web history panel. Once the maximum size is
671
   automatic indexing on page load does not.
637
   reached, old pages are purged - both from the cache and the index - to
672
638
   make room for new ones, so you need to explicitly archive in some other
673
     ----------------------------------------------------------------------
639
   place the pages that you want to keep indefinitely.
674
640
675
2.5. Periodic indexing
641
2.5. Periodic indexing
676
642
677
  2.5.1. Running indexing
643
  2.5.1. Running indexing
678
644
...
...
687
   start indexing (except if canceled).
653
   start indexing (except if canceled).
688
654
689
   The recollindex indexing process can be interrupted by sending an
655
   The recollindex indexing process can be interrupted by sending an
690
   interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may
656
   interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may
691
   elapse before the process exits, because it needs to properly flush and
657
   elapse before the process exits, because it needs to properly flush and
692
   close the index. This can also be done from the recoll GUI File->Stop
658
   close the index. This can also be done from the recoll GUI File -> Stop
693
   Indexing menu entry.
659
   Indexing menu entry.
694
660
695
   After such an interruption, the index will be somewhat inconsistent
661
   After such an interruption, the index will be somewhat inconsistent
696
   because some operations which are normally performed at the end of the
662
   because some operations which are normally performed at the end of the
697
   indexing pass will have been skipped (for example, the stemming and
663
   indexing pass will have been skipped (for example, the stemming and
...
...
721
   file selection process for some area of the file system, by adding the top
687
   file selection process for some area of the file system, by adding the top
722
   directory to the skippedPaths list and using an appropriate file selection
688
   directory to the skippedPaths list and using an appropriate file selection
723
   method to build the file list to be fed to recollindex -if. Trivial
689
   method to build the file list to be fed to recollindex -if. Trivial
724
   example:
690
   example:
725
691
726
            find . -name indexable.txt -print | recollindex -if
692
             find . -name indexable.txt -print | recollindex -if
727
          
693
          
728
694
729
   recollindex -i will not descend into subdirectories specified as
695
   recollindex -i will not descend into subdirectories specified as
730
   parameters, but just add them as index entries. It is up to the external
696
   parameters, but just add them as index entries. It is up to the external
731
   file selection method to build the complete file list.
697
   file selection method to build the complete file list.
732
698
733
     ----------------------------------------------------------------------
734
735
  2.5.2. Using cron to automate indexing
699
  2.5.2. Using cron to automate indexing
736
700
737
   The most common way to set up indexing is to have a cron task execute it
701
   The most common way to set up indexing is to have a cron task execute it
738
   every night. For example the following crontab entry would do it every day
702
   every night. For example the following crontab entry would do it every day
739
   at 3:30AM (supposing recollindex is in your PATH):
703
   at 3:30AM (supposing recollindex is in your PATH):
...
...
743
   Or, using anacron:
707
   Or, using anacron:
744
708
745
 1  15  su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"
709
 1  15  su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"
746
710
747
   As of version 1.17 the Recoll GUI has dialogs to manage crontab entries
711
   As of version 1.17 the Recoll GUI has dialogs to manage crontab entries
748
   for recollindex. You can reach them from the Preferences->Indexing
712
   for recollindex. You can reach them from the Preferences -> Indexing
749
   Schedule menu. They only work with the good old cron, and do not give
713
   Schedule menu. They only work with the good old cron, and do not give
750
   access to all features of cron scheduling.
714
   access to all features of cron scheduling.
751
715
752
   The usual command to edit your crontab is crontab -e (which will usually
716
   The usual command to edit your crontab is crontab -e (which will usually
753
   start the vi editor to edit the file). You may have more sophisticated
717
   start the vi editor to edit the file). You may have more sophisticated
...
...
756
   Please be aware that there may be differences between your usual
720
   Please be aware that there may be differences between your usual
757
   interactive command line environment and the one seen by crontab commands.
721
   interactive command line environment and the one seen by crontab commands.
758
   Especially the PATH variable may be of concern. Please check the crontab
722
   Especially the PATH variable may be of concern. Please check the crontab
759
   manual pages about possible issues.
723
   manual pages about possible issues.
760
724
761
     ----------------------------------------------------------------------
762
763
2.6. Real time indexing
725
2.6. Real time indexing
764
726
765
   Real time monitoring/indexing is performed by starting the recollindex -m
727
   Real time monitoring/indexing is performed by starting the recollindex -m
766
   command. With this option, recollindex will detach from the terminal and
728
   command. With this option, recollindex will detach from the terminal and
767
   become a daemon, permanently monitoring file changes and updating the
729
   become a daemon, permanently monitoring file changes and updating the
...
...
785
 recollconf=$HOME/.recoll-home
747
 recollconf=$HOME/.recoll-home
786
 recolldata=/usr/local/share/recoll
748
 recolldata=/usr/local/share/recoll
787
 RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
749
 RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
788
750
789
 fvwm
751
 fvwm
752
790
753
791
   The indexing daemon gets started, then the window manager, for which the
754
   The indexing daemon gets started, then the window manager, for which the
792
   session waits.
755
   session waits.
793
756
794
   By default the indexing daemon will monitor the state of the X11 session,
757
   By default the indexing daemon will monitor the state of the X11 session,
...
...
816
   email folders change. Also, monitoring large file trees by itself
779
   email folders change. Also, monitoring large file trees by itself
817
   significantly taxes system resources. You probably do not want to enable
780
   significantly taxes system resources. You probably do not want to enable
818
   it if your system is short on resources. Periodic indexing is adequate in
781
   it if your system is short on resources. Periodic indexing is adequate in
819
   most cases.
782
   most cases.
820
783
821
     ----------------------------------------------------------------------
822
823
  2.6.1. Slowing down the reindexing rate for fast changing files
784
  2.6.1. Slowing down the reindexing rate for fast changing files
824
785
825
   When using the real time monitor, it may happen that some files need to be
786
   When using the real time monitor, it may happen that some files need to be
826
   indexed, but change so often that they impose an excessive load for the
787
   indexed, but change so often that they impose an excessive load for the
827
   system.
788
   system.
828
789
829
   Recoll provides a configuration option to specify the minimum time before
790
   Recoll provides a configuration option to specify the minimum time before
830
   which a file, specified by a wildcard pattern, cannot be reindexed. See
791
   which a file, specified by a wildcard pattern, cannot be reindexed. See
831
   the mondelaypatterns parameter in the configuration section.
792
   the mondelaypatterns parameter in the configuration section.
832
793
833
     ----------------------------------------------------------------------
794
Chapter 3. Searching
834
835
                              Chapter 3. Searching
836
795
837
3.1. Searching with the Qt graphical user interface
796
3.1. Searching with the Qt graphical user interface
838
797
839
   The recoll program provides the main user interface for searching. It is
798
   The recoll program provides the main user interface for searching. It is
840
   based on the Qt library.
799
   based on the Qt library.
841
800
842
   recoll has two search modes:
801
   recoll has two search modes:
843
802
844
     * Simple search (the default, on the main screen) has a single entry
803
     o Simple search (the default, on the main screen) has a single entry
845
       field where you can enter multiple words.
804
       field where you can enter multiple words.
846
805
847
     * Advanced search (a panel accessed through the Tools menu or the
806
     o Advanced search (a panel accessed through the Tools menu or the
848
       toolbox bar icon) has multiple entry fields, which you may use to
807
       toolbox bar icon) has multiple entry fields, which you may use to
849
       build a logical condition, with additional filtering on file type,
808
       build a logical condition, with additional filtering on file type,
850
       location in the file system, modification date, and size.
809
       location in the file system, modification date, and size.
851
810
852
   In most cases, you can enter the terms as you think them, even if they
811
   In most cases, you can enter the terms as you think them, even if they
...
...
858
   printed is for east-asian languages (Chinese, Japanese, Korean). Words
817
   printed is for east-asian languages (Chinese, Japanese, Korean). Words
859
   composed of single or multiple characters should be entered separated by
818
   composed of single or multiple characters should be entered separated by
860
   white space in this case (they would typically be printed without white
819
   white space in this case (they would typically be printed without white
861
   space).
820
   space).
862
821
863
     ----------------------------------------------------------------------
864
865
  3.1.1. Simple search
822
  3.1.1. Simple search
866
823
867
    1. Start the recoll program.
824
    1. Start the recoll program.
868
825
869
    2. Possibly choose a search mode: Any term, All terms, File name or Query
826
    2. Possibly choose a search mode: Any term, All terms, File name or Query
...
...
888
   File name will specifically look for file names. The point of having a
845
   File name will specifically look for file names. The point of having a
889
   separate file name search is that wild card expansion can be performed
846
   separate file name search is that wild card expansion can be performed
890
   more efficiently on a small subset of the index (allowing wild cards on
847
   more efficiently on a small subset of the index (allowing wild cards on
891
   the left of terms without excessive penality). Things to know:
848
   the left of terms without excessive penality). Things to know:
892
849
893
     * White space in the entry should match white space in the file name,
850
     o White space in the entry should match white space in the file name,
894
       and is not treated specially.
851
       and is not treated specially.
895
852
896
     * The search is insensitive to character case and accents, independantly
853
     o The search is insensitive to character case and accents, independantly
897
       of the type of index.
854
       of the type of index.
898
855
899
     * An entry without any wild card character and not capitalized will be
856
     o An entry without any wild card character and not capitalized will be
900
       prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
857
       prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
901
858
902
     * If you have a big index (many files), excessively generic fragments
859
     o If you have a big index (many files), excessively generic fragments
903
       may result in inefficient searches.
860
       may result in inefficient searches.
904
861
905
   You can search for exact phrases (adjacent words in a given order) by
862
   You can search for exact phrases (adjacent words in a given order) by
906
   enclosing the input inside double quotes. Ex: "virtual reality".
863
   enclosing the input inside double quotes. Ex: "virtual reality".
907
864
...
...
928
   punctuation, newlines and all - except for wildcard characters (single ?
885
   punctuation, newlines and all - except for wildcard characters (single ?
929
   characters are ok). Recoll will process it and produce a meaningful
886
   characters are ok). Recoll will process it and produce a meaningful
930
   search. This is what most differentiates this mode from the Query Language
887
   search. This is what most differentiates this mode from the Query Language
931
   mode, where you have to care about the syntax.
888
   mode, where you have to care about the syntax.
932
889
933
   You can use the Tools->Advanced search dialog for more complex searches.
890
   You can use the Tools -> Advanced search dialog for more complex searches.
934
935
     ----------------------------------------------------------------------
936
891
937
  3.1.2. The default result list
892
  3.1.2. The default result list
938
893
939
   After starting a search, a list of results will instantly be displayed in
894
   After starting a search, a list of results will instantly be displayed in
940
   the main list window.
895
   the main list window.
...
...
949
   open tabs in the existing preview window. You can use Shift+Click to force
904
   open tabs in the existing preview window. You can use Shift+Click to force
950
   the creation of another preview window, which may be useful to view the
905
   the creation of another preview window, which may be useful to view the
951
   documents side by side. (You can also browse successive results in a
906
   documents side by side. (You can also browse successive results in a
952
   single preview window by typing Shift+ArrowUp/Down in the window).
907
   single preview window by typing Shift+ArrowUp/Down in the window).
953
908
954
   Clicking the Open link will attempt to start an external viewer. The
909
   Clicking the Open link will start an external viewer for the document. By
955
   viewer for each document type can be configured through the user
910
   default, Recoll lets the desktop choose the appropriate application for
956
   preferences dialog, or by editing the mimeview configuration file. You can
911
   most document types (there is a short list of exceptions, see further). If
912
   you prefer to completely customize the choice of applications, you can
957
   also check the Use desktop preferences option in the GUI preferences
913
   uncheck the Use desktop preferences option in the GUI preferences dialog,
958
   dialog to use the desktop defaults for all documents. This is probably the
914
   and click the Choose editor applications button to adjust the predefined
959
   best option if you are using a well configured Gnome or KDE desktop.
915
   Recoll choices. The tool accepts multiple selections of mime types (e.g.
916
   to set up the editor for the dozens of office file types).
917
918
   Even when Use desktop preferences is checked, there is a small list of
919
   exceptions, for mime types where the Recoll choice should override the
920
   desktop one. These are applications which are well integrated with Recoll,
921
   especially evince for viewing PDF and Postscript files because of its
922
   support for opening the document at a specific page and passing a search
923
   string as an argument. Of course, you can edit the list (in the GUI
924
   preferences) if you would prefer to lose the functionality and use the
925
   standard desktop tool.
926
927
   You may also change the choice of applications by editing the mimeview
928
   configuration file if you find this more convenient.
960
929
961
   The Preview and Open edit links may not be present for all entries,
930
   The Preview and Open edit links may not be present for all entries,
962
   meaning that Recoll has no configured way to preview a given file type
931
   meaning that Recoll has no configured way to preview a given file type
963
   (which was indexed by name only), or no configured external editor for the
932
   (which was indexed by name only), or no configured external editor for the
964
   file type. This can sometimes be adjusted simply by tweaking the mimemap
933
   file type. This can sometimes be adjusted simply by tweaking the mimemap
...
...
977
946
978
   The result list is divided into pages (the size of which you can change in
947
   The result list is divided into pages (the size of which you can change in
979
   the preferences). Use the arrow buttons in the toolbar or the links at the
948
   the preferences). Use the arrow buttons in the toolbar or the links at the
980
   bottom of the page to browse the results.
949
   bottom of the page to browse the results.
981
950
982
     ----------------------------------------------------------------------
951
    3.1.2.1. No results: the spelling suggestions
983
952
953
   When a search yields no result, and if the aspell dictionary is
954
   configured, Recoll will try to check for misspellings among the query
955
   terms, and will propose lists of replacements. Clicking on one of the
956
   suggestions will replace the word and restart the search. You can hold any
957
   of the modifier keys (Ctrl, Shift, etc.) while clicking if you would
958
   rather stay on the suggestion screen because several terms need
959
   replacement.
960
984
    3.1.2.1. The result list right-click menu
961
    3.1.2.2. The result list right-click menu
985
962
986
   Apart from the preview and edit links, you can display a pop-up menu by
963
   Apart from the preview and edit links, you can display a pop-up menu by
987
   right-clicking over a paragraph in the result list. This menu has the
964
   right-clicking over a paragraph in the result list. This menu has the
988
   following entries:
965
   following entries:
989
966
990
     * Preview
967
     o Preview
991
968
992
     * Open
969
     o Open
993
970
994
     * Copy File Name
971
     o Copy File Name
995
972
996
     * Copy Url
973
     o Copy Url
997
974
998
     * Save to File
975
     o Save to File
999
976
1000
     * Find similar
977
     o Find similar
1001
978
1002
     * Preview Parent document
979
     o Preview Parent document
1003
980
1004
     * Open Parent document
981
     o Open Parent document
1005
982
1006
     * Open Snippets Window
983
     o Open Snippets Window
1007
984
1008
   The Preview and Open entries do the same thing as the corresponding links.
985
   The Preview and Open entries do the same thing as the corresponding links.
1009
986
1010
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
987
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
1011
   for later pasting.
988
   for later pasting.
...
...
1036
   lists extracts from the document, taken around search terms occurrences,
1013
   lists extracts from the document, taken around search terms occurrences,
1037
   along with the corresponding page number, as links which can be used to
1014
   along with the corresponding page number, as links which can be used to
1038
   start the native viewer on the appropriate page. If the viewer supports
1015
   start the native viewer on the appropriate page. If the viewer supports
1039
   it, its search function will also be primed with one of the search terms.
1016
   it, its search function will also be primed with one of the search terms.
1040
1017
1041
     ----------------------------------------------------------------------
1042
1043
  3.1.3. The result table
1018
  3.1.3. The result table
1044
1019
1045
   In Recoll 1.15 and newer, the results can be displayed in spreadsheet-like
1020
   In Recoll 1.15 and newer, the results can be displayed in spreadsheet-like
1046
   fashion. You can switch to this presentation by clicking the table-like
1021
   fashion. You can switch to this presentation by clicking the table-like
1047
   icon in the toolbar (this is a toggle, click again to restore the list).
1022
   icon in the toolbar (this is a toggle, click again to restore the list).
...
...
1063
   window with the corresponding values. You can click the row to freeze the
1038
   window with the corresponding values. You can click the row to freeze the
1064
   display. The bottom area is equivalent to a result list paragraph, with
1039
   display. The bottom area is equivalent to a result list paragraph, with
1065
   links for starting a preview or a native application, and an equivalent
1040
   links for starting a preview or a native application, and an equivalent
1066
   right-click menu. Typing Esc (the Escape key) will unfreeze the display.
1041
   right-click menu. Typing Esc (the Escape key) will unfreeze the display.
1067
1042
1068
     ----------------------------------------------------------------------
1043
  3.1.4. Displaying thumbnails
1069
1044
1045
   The default format for the result list entries and the detail area of the
1046
   result table display an icon for each result document. The icon is either
1047
   a generic one determined from the MIME type, or a thumbnail of the
1048
   document appearance. Thumbnails are only displayed if found in the
1049
   standard freedesktop location, where they would typically have been
1050
   created by a file manager.
1051
1052
   Recoll has no capability to create thumbnails. A relatively simple trick
1053
   is to use the Open parent document/folder entry in the result list popup
1054
   menu. This should open a file manager window on the containing directory,
1055
   which should in turn create the thumbnails (depending on your settings).
1056
   Restarting the search should then display the thumbnails.
1057
1058
   There are also some pointers about thumbnail generation on the Recoll
1059
   wiki.
1060
1070
  3.1.4. The preview window
1061
  3.1.5. The preview window
1071
1062
1072
   The preview window opens when you first click a Preview link inside the
1063
   The preview window opens when you first click a Preview link inside the
1073
   result list.
1064
   result list.
1074
1065
1075
   Subsequent preview requests for a given search open new tabs in the
1066
   Subsequent preview requests for a given search open new tabs in the
...
...
1098
   metadata stored in the index.
1089
   metadata stored in the index.
1099
1090
1100
   You can print the current preview window contents by typing Ctrl-P (Ctrl +
1091
   You can print the current preview window contents by typing Ctrl-P (Ctrl +
1101
   P) in the window text.
1092
   P) in the window text.
1102
1093
1103
     ----------------------------------------------------------------------
1104
1105
    3.1.4.1. Searching inside the preview
1094
    3.1.5.1. Searching inside the preview
1106
1095
1107
   The preview window has an internal search capability, mostly controlled by
1096
   The preview window has an internal search capability, mostly controlled by
1108
   the panel at the bottom of the window, which works in two modes: as a
1097
   the panel at the bottom of the window, which works in two modes: as a
1109
   classical editor incremental search, where we look for the text entered in
1098
   classical editor incremental search, where we look for the text entered in
1110
   the entry zone, or as a way to walk the matches between the document and
1099
   the entry zone, or as a way to walk the matches between the document and
...
...
1133
           list for this group will be walked. This is not the same as a text
1122
           list for this group will be walked. This is not the same as a text
1134
           search, because the occurences will include non-exact matches (as
1123
           search, because the occurences will include non-exact matches (as
1135
           caused by stemming or wildcards). The search will revert to the
1124
           caused by stemming or wildcards). The search will revert to the
1136
           text mode as soon as you edit the entry area.
1125
           text mode as soon as you edit the entry area.
1137
1126
1138
     ----------------------------------------------------------------------
1139
1140
  3.1.5. Complex/advanced search
1127
  3.1.6. Complex/advanced search
1141
1128
1142
   The advanced search dialog helps you build more complex queries without
1129
   The advanced search dialog helps you build more complex queries without
1143
   memorizing the search language constructs. It can be opened through the
1130
   memorizing the search language constructs. It can be opened through the
1144
   Tools menu or through the main toolbar.
1131
   Tools menu or through the main toolbar.
1145
1132
...
...
1156
   always performs a simple search.
1143
   always performs a simple search.
1157
1144
1158
   Click on the Show query details link at the top of the result page to see
1145
   Click on the Show query details link at the top of the result page to see
1159
   the query expansion.
1146
   the query expansion.
1160
1147
1161
     ----------------------------------------------------------------------
1162
1163
    3.1.5.1. Avanced search: the "find" tab
1148
    3.1.6.1. Avanced search: the "find" tab
1164
1149
1165
   This part of the dialog lets you constructc a query by combining multiple
1150
   This part of the dialog lets you constructc a query by combining multiple
1166
   clauses of different types. Each entry field is configurable for the
1151
   clauses of different types. Each entry field is configurable for the
1167
   following modes:
1152
   following modes:
1168
1153
1169
     * All terms.
1154
     o All terms.
1170
1155
1171
     * Any term.
1156
     o Any term.
1172
1157
1173
     * None of the terms.
1158
     o None of the terms.
1174
1159
1175
     * Phrase (exact terms in order within an adjustable window).
1160
     o Phrase (exact terms in order within an adjustable window).
1176
1161
1177
     * Proximity (terms in any order within an adjustable window).
1162
     o Proximity (terms in any order within an adjustable window).
1178
1163
1179
     * Filename search.
1164
     o Filename search.
1180
1165
1181
   Additional entry fields can be created by clicking the Add clause button.
1166
   Additional entry fields can be created by clicking the Add clause button.
1182
1167
1183
   When searching, the non-empty clauses will be combined either with an AND
1168
   When searching, the non-empty clauses will be combined either with an AND
1184
   or an OR conjunction, depending on the choice made on the left (All
1169
   or an OR conjunction, depending on the choice made on the left (All
...
...
1198
   quick fox with a slack of 0 will match quick fox but not quick brown fox.
1183
   quick fox with a slack of 0 will match quick fox but not quick brown fox.
1199
   With a slack of 1 it will match the latter, but not fox quick. A proximity
1184
   With a slack of 1 it will match the latter, but not fox quick. A proximity
1200
   search for quick fox with the default slack will match the latter, and
1185
   search for quick fox with the default slack will match the latter, and
1201
   also a fox is a cunning and quick animal.
1186
   also a fox is a cunning and quick animal.
1202
1187
1203
     ----------------------------------------------------------------------
1204
1205
    3.1.5.2. Avanced search: the "filter" tab
1188
    3.1.6.2. Avanced search: the "filter" tab
1206
1189
1207
   This part of the dialog has several sections which allow filtering the
1190
   This part of the dialog has several sections which allow filtering the
1208
   results of a search according to a number of criteria
1191
   results of a search according to a number of criteria
1209
1192
1210
     * The first section allows filtering by dates of last modification. You
1193
     o The first section allows filtering by dates of last modification. You
1211
       can specify both a minimum and a maximum date. The initial values are
1194
       can specify both a minimum and a maximum date. The initial values are
1212
       set according to the oldest and newest documents found in the index.
1195
       set according to the oldest and newest documents found in the index.
1213
1196
1214
     * The next section allows filtering the results by file size. There are
1197
     o The next section allows filtering the results by file size. There are
1215
       two entries for minimum and maximum size. Enter decimal numbers. You
1198
       two entries for minimum and maximum size. Enter decimal numbers. You
1216
       can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
1199
       can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
1217
       respectively.
1200
       respectively.
1218
1201
1219
     * The next section allows filtering the results by their mime types, or
1202
     o The next section allows filtering the results by their mime types, or
1220
       mime categories (ie: media/text/message/etc.).
1203
       mime categories (ie: media/text/message/etc.).
1221
1204
1222
       You can transfer the types between two boxes, to define which will be
1205
       You can transfer the types between two boxes, to define which will be
1223
       included or excluded by the search.
1206
       included or excluded by the search.
1224
1207
1225
       The state of the file type selection can be saved as the default (the
1208
       The state of the file type selection can be saved as the default (the
1226
       file type filter will not be activated at program start-up, but the
1209
       file type filter will not be activated at program start-up, but the
1227
       lists will be in the restored state).
1210
       lists will be in the restored state).
1228
1211
1229
     * The bottom section allows restricting the search results to a sub-tree
1212
     o The bottom section allows restricting the search results to a sub-tree
1230
       of the indexed area. You can use the Invert checkbox to search for
1213
       of the indexed area. You can use the Invert checkbox to search for
1231
       files not in the sub-tree instead. If you use directory filtering
1214
       files not in the sub-tree instead. If you use directory filtering
1232
       often and on big subsets of the file system, you may think of setting
1215
       often and on big subsets of the file system, you may think of setting
1233
       up multiple indexes instead, as the performance may be better.
1216
       up multiple indexes instead, as the performance may be better.
1234
1217
1235
       You can use relative/partial paths for filtering. Ie, entering
1218
       You can use relative/partial paths for filtering. Ie, entering
1236
       dirA/dirB would match either /dir1/dirA/dirB/myfile1 or
1219
       dirA/dirB would match either /dir1/dirA/dirB/myfile1 or
1237
       /dir2/dirA/dirB/someother/myfile2.
1220
       /dir2/dirA/dirB/someother/myfile2.
1238
1221
1239
     ----------------------------------------------------------------------
1240
1241
    3.1.5.3. Avanced search history
1222
    3.1.6.3. Avanced search history
1242
1223
1243
   The advanced search tool memorizes the last 100 searches performed. You
1224
   The advanced search tool memorizes the last 100 searches performed. You
1244
   can walk the saved searches by using the up and down arrow keys while the
1225
   can walk the saved searches by using the up and down arrow keys while the
1245
   keyboard focus belongs to the advanced search dialog.
1226
   keyboard focus belongs to the advanced search dialog.
1246
1227
1247
   The complex search history can be erased, along with the one for simple
1228
   The complex search history can be erased, along with the one for simple
1248
   search, by selecting the File->Erase Search History menu entry.
1229
   search, by selecting the File -> Erase Search History menu entry.
1249
1230
1250
     ----------------------------------------------------------------------
1251
1252
  3.1.6. The term explorer tool
1231
  3.1.7. The term explorer tool
1253
1232
1254
   Recoll automatically manages the expansion of search terms to their
1233
   Recoll automatically manages the expansion of search terms to their
1255
   derivatives (ie: plural/singular, verb inflections). But there are other
1234
   derivatives (ie: plural/singular, verb inflections). But there are other
1256
   cases where the exact search term is not known. For example, you may not
1235
   cases where the exact search term is not known. For example, you may not
1257
   remember the exact spelling, or only know the beginning of the name.
1236
   remember the exact spelling, or only know the beginning of the name.
...
...
1300
1279
1301
   Double-clicking on a term in the result list will insert it into the
1280
   Double-clicking on a term in the result list will insert it into the
1302
   simple search entry field. You can also cut/paste between the result list
1281
   simple search entry field. You can also cut/paste between the result list
1303
   and any entry field (the end of lines will be taken care of).
1282
   and any entry field (the end of lines will be taken care of).
1304
1283
1305
     ----------------------------------------------------------------------
1306
1307
  3.1.7. Multiple indexes
1284
  3.1.8. Multiple indexes
1308
1285
1309
   See the section describing the use of multiple indexes for generalities.
1286
   See the section describing the use of multiple indexes for generalities.
1310
   Only the aspects concerning the recoll GUI are described here.
1287
   Only the aspects concerning the recoll GUI are described here.
1311
1288
1312
   A recoll program instance is always associated with a specific index,
1289
   A recoll program instance is always associated with a specific index,
...
...
1343
1320
1344
   RECOLL_ACTIVE_EXTRA_DBS is available for Recoll versions 1.17.2 and later.
1321
   RECOLL_ACTIVE_EXTRA_DBS is available for Recoll versions 1.17.2 and later.
1345
   A change was made in the same update so that recoll will automatically
1322
   A change was made in the same update so that recoll will automatically
1346
   deactivate unreachable indexes when starting up.
1323
   deactivate unreachable indexes when starting up.
1347
1324
1348
     ----------------------------------------------------------------------
1349
1350
  3.1.8. Document history
1325
  3.1.9. Document history
1351
1326
1352
   Documents that you actually view (with the internal preview or an external
1327
   Documents that you actually view (with the internal preview or an external
1353
   tool) are entered into the document history, which is remembered.
1328
   tool) are entered into the document history, which is remembered.
1354
1329
1355
   You can display the history list by using the Tools/Doc History menu
1330
   You can display the history list by using the Tools/Doc History menu
1356
   entry.
1331
   entry.
1357
1332
1358
   You can erase the document history by using the Erase document history
1333
   You can erase the document history by using the Erase document history
1359
   entry in the File menu.
1334
   entry in the File menu.
1360
1335
1361
     ----------------------------------------------------------------------
1362
1363
  3.1.9. Sorting search results and collapsing duplicates
1336
  3.1.10. Sorting search results and collapsing duplicates
1364
1337
1365
   The documents in a result list are normally sorted in order of relevance.
1338
   The documents in a result list are normally sorted in order of relevance.
1366
   It is possible to specify a different sort order, either by using the
1339
   It is possible to specify a different sort order, either by using the
1367
   vertical arrows in the GUI toolbox to sort by date, or switching to the
1340
   vertical arrows in the GUI toolbox to sort by date, or switching to the
1368
   result table display and clicking on any header. The sort order chosen
1341
   result table display and clicking on any header. The sort order chosen
...
...
1380
   identity is based on an MD5 hash of the document container, not only of
1353
   identity is based on an MD5 hash of the document container, not only of
1381
   the text contents (so that ie, a text document with an image added will
1354
   the text contents (so that ie, a text document with an image added will
1382
   not be a duplicate of the text only). Duplicates hiding is controlled by
1355
   not be a duplicate of the text only). Duplicates hiding is controlled by
1383
   an entry in the GUI configuration dialog, and is off by default.
1356
   an entry in the GUI configuration dialog, and is off by default.
1384
1357
1385
     ----------------------------------------------------------------------
1386
1387
  3.1.10. Search tips, shortcuts
1358
  3.1.11. Search tips, shortcuts
1388
1359
1389
    3.1.10.1. Terms and search expansion
1360
    3.1.11.1. Terms and search expansion
1390
1361
1391
   Term completion. Typing Esc Space in the simple search entry field while
1362
   Term completion. Typing Esc Space in the simple search entry field while
1392
   entering a word will either complete the current word if its beginning
1363
   entering a word will either complete the current word if its beginning
1393
   matches a unique term in the index, or open a window to propose a list of
1364
   matches a unique term in the index, or open a window to propose a list of
1394
   completions.
1365
   completions.
...
...
1421
   index all directories in the file path as terms. This has been abandoned
1392
   index all directories in the file path as terms. This has been abandoned
1422
   as it did not seem really useful). Alternatively, you can use the specific
1393
   as it did not seem really useful). Alternatively, you can use the specific
1423
   file name search which will only look for file names, and may be faster
1394
   file name search which will only look for file names, and may be faster
1424
   than the generic search especially when using wildcards.
1395
   than the generic search especially when using wildcards.
1425
1396
1426
     ----------------------------------------------------------------------
1427
1428
    3.1.10.2. Working with phrases and proximity
1397
    3.1.11.2. Working with phrases and proximity
1429
1398
1430
   Phrases and Proximity searches. A phrase can be looked for by enclosing it
1399
   Phrases and Proximity searches. A phrase can be looked for by enclosing it
1431
   in double quotes. Example: "user manual" will look only for occurrences of
1400
   in double quotes. Example: "user manual" will look only for occurrences of
1432
   user immediately followed by manual. You can use the This phrase field of
1401
   user immediately followed by manual. You can use the This phrase field of
1433
   the advanced search dialog to the same effect. Phrases can be entered
1402
   the advanced search dialog to the same effect. Phrases can be entered
...
...
1453
   IBM. Searching for the word inside a phrase (ie: "the IBM company") will
1422
   IBM. Searching for the word inside a phrase (ie: "the IBM company") will
1454
   only match the dotted abrreviation if you increase the phrase slack (using
1423
   only match the dotted abrreviation if you increase the phrase slack (using
1455
   the advanced search panel control, or the o query language modifier).
1424
   the advanced search panel control, or the o query language modifier).
1456
   Literal occurences of the word will be matched normally.
1425
   Literal occurences of the word will be matched normally.
1457
1426
1458
     ----------------------------------------------------------------------
1459
1460
    3.1.10.3. Others
1427
    3.1.11.3. Others
1461
1428
1462
   Using fields. You can use the query language and field specifications to
1429
   Using fields. You can use the query language and field specifications to
1463
   only search certain parts of documents. This can be especially helpful
1430
   only search certain parts of documents. This can be especially helpful
1464
   with email, for example only searching emails from a specific originator:
1431
   with email, for example only searching emails from a specific originator:
1465
   search tips from:helpfulgui
1432
   search tips from:helpfulgui
...
...
1499
   Printing previews. Entering Ctrl-P in a preview window will print the
1466
   Printing previews. Entering Ctrl-P in a preview window will print the
1500
   currently displayed text.
1467
   currently displayed text.
1501
1468
1502
   Quitting. Entering Ctrl-Q almost anywhere will close the application.
1469
   Quitting. Entering Ctrl-Q almost anywhere will close the application.
1503
1470
1504
     ----------------------------------------------------------------------
1505
1506
  3.1.11. Customizing the search interface
1471
  3.1.12. Customizing the search interface
1507
1472
1508
   You can customize some aspects of the search interface by using the GUI
1473
   You can customize some aspects of the search interface by using the GUI
1509
   configuration entry in the Preferences menu.
1474
   configuration entry in the Preferences menu.
1510
1475
1511
   There are several tabs in the dialog, dealing with the interface itself,
1476
   There are several tabs in the dialog, dealing with the interface itself,
1512
   the parameters used for searching and returning results, and what indexes
1477
   the parameters used for searching and returning results, and what indexes
1513
   are searched.
1478
   are searched.
1514
1479
1515
   User interface parameters:
1480
   User interface parameters: 
1516
1481
1517
     * Highlight color for query terms: Terms from the user query are
1482
     o Highlight color for query terms: Terms from the user query are
1518
       highlighted in the result list samples and the preview window. The
1483
       highlighted in the result list samples and the preview window. The
1519
       color can be chosen here. Any Qt color string should work (ie red,
1484
       color can be chosen here. Any Qt color string should work (ie red,
1520
       #ff0000). The default is blue.
1485
       #ff0000). The default is blue.
1521
1486
1522
     * Style sheet: The name of a Qt style sheet text file which is applied
1487
     o Style sheet: The name of a Qt style sheet text file which is applied
1523
       to the whole Recoll application on startup. The default value is
1488
       to the whole Recoll application on startup. The default value is
1524
       empty, but there is a skeleton style sheet (recoll.qss) inside the
1489
       empty, but there is a skeleton style sheet (recoll.qss) inside the
1525
       /usr/share/recoll/examples directory. Using a style sheet, you can
1490
       /usr/share/recoll/examples directory. Using a style sheet, you can
1526
       change most recoll graphical parameters: colors, fonts, etc. See the
1491
       change most recoll graphical parameters: colors, fonts, etc. See the
1527
       sample file for a few simple examples.
1492
       sample file for a few simple examples.
1528
1493
1529
     * Maximum text size highlighted for preview Inserting highlights on
1494
     o Maximum text size highlighted for preview Inserting highlights on
1530
       search term inside the text before inserting it in the preview window
1495
       search term inside the text before inserting it in the preview window
1531
       involves quite a lot of processing, and can be disabled over the given
1496
       involves quite a lot of processing, and can be disabled over the given
1532
       text size to speed up loading.
1497
       text size to speed up loading.
1533
1498
1534
     * Prefer HTML to plain text for preview if set, Recoll will display HTML
1499
     o Prefer HTML to plain text for preview if set, Recoll will display HTML
1535
       as such inside the preview window. If this causes problems with the Qt
1500
       as such inside the preview window. If this causes problems with the Qt
1536
       HTML display, you can uncheck it to display the plain text version
1501
       HTML display, you can uncheck it to display the plain text version
1537
       instead.
1502
       instead.
1538
1503
1539
     * Plain text to HTML line style: when displaying plain text inside the
1504
     o Plain text to HTML line style: when displaying plain text inside the
1540
       preview window, Recoll tries to preserve some of the original text
1505
       preview window, Recoll tries to preserve some of the original text
1541
       line breaks and indentation. It can either use PRE HTML tags, which
1506
       line breaks and indentation. It can either use PRE HTML tags, which
1542
       will well preserve the indentation but will force horizontal scrolling
1507
       will well preserve the indentation but will force horizontal scrolling
1543
       for long lines, or use BR tags to break at the original line breaks,
1508
       for long lines, or use BR tags to break at the original line breaks,
1544
       which will let the editor introduce other line breaks according to the
1509
       which will let the editor introduce other line breaks according to the
1545
       window width, but will lose some of the original indentation. The
1510
       window width, but will lose some of the original indentation. The
1546
       third option has been available in recent releases and is probably now
1511
       third option has been available in recent releases and is probably now
1547
       the best one: use PRE tags with line wrapping.
1512
       the best one: use PRE tags with line wrapping.
1548
1513
1549
     * Use desktop preferences to choose document editor: if this is checked,
1514
     o Use desktop preferences to choose document editor: if this is checked,
1550
       the xdg-open utility will be used to open files when you click the
1515
       the xdg-open utility will be used to open files when you click the
1551
       Open link in the result list, instead of the application defined in
1516
       Open link in the result list, instead of the application defined in
1552
       mimeview. xdg-open will in term use your desktop preferences to choose
1517
       mimeview. xdg-open will in term use your desktop preferences to choose
1553
       an appropriate application.
1518
       an appropriate application.
1554
1519
1555
     * Exceptions: when using the desktop preferences for opening documents,
1520
     o Exceptions: when using the desktop preferences for opening documents,
1556
       these are mime types that will still be opened according to Recoll
1521
       these are mime types that will still be opened according to Recoll
1557
       preferences. This is useful for passing parameters like page numbers
1522
       preferences. This is useful for passing parameters like page numbers
1558
       or search strings to applications that support them (e.g. evince).
1523
       or search strings to applications that support them (e.g. evince).
1559
       This cannot be done with xdg-open which only supports passing one
1524
       This cannot be done with xdg-open which only supports passing one
1560
       parameter.
1525
       parameter.
1561
1526
1562
     * Choose editor applications this will let you choose the command
1527
     o Choose editor applications this will let you choose the command
1563
       started by the Open links inside the result list, for specific
1528
       started by the Open links inside the result list, for specific
1564
       document types.
1529
       document types.
1565
1530
1566
     * Display category filter as toolbar... this will let you choose if the
1531
     o Display category filter as toolbar... this will let you choose if the
1567
       document categories are displayed as a list or a set of buttons.
1532
       document categories are displayed as a list or a set of buttons.
1568
1533
1569
     * Auto-start simple search on white space entry: if this is checked, a
1534
     o Auto-start simple search on white space entry: if this is checked, a
1570
       search will be executed each time you enter a space in the simple
1535
       search will be executed each time you enter a space in the simple
1571
       search input field. This lets you look at the result list as you enter
1536
       search input field. This lets you look at the result list as you enter
1572
       new terms. This is off by default, you may like it or not...
1537
       new terms. This is off by default, you may like it or not...
1573
1538
1574
     * Start with advanced search dialog open : If you use this dialog
1539
     o Start with advanced search dialog open : If you use this dialog
1575
       frequently, checking the entries will get it to open when recoll
1540
       frequently, checking the entries will get it to open when recoll
1576
       starts.
1541
       starts.
1577
1542
1578
     * Remember sort activation state if set, Recoll will remember the sort
1543
     o Remember sort activation state if set, Recoll will remember the sort
1579
       tool stat between invocations. It normally starts with sorting
1544
       tool stat between invocations. It normally starts with sorting
1580
       disabled.
1545
       disabled.
1581
1546
1582
   Result list parameters:
1547
   Result list parameters: 
1583
1548
1584
     * Number of results in a result page
1549
     o Number of results in a result page
1585
1550
1586
     * Result list font: There is quite a lot of information shown in the
1551
     o Result list font: There is quite a lot of information shown in the
1587
       result list, and you may want to customize the font and/or font size.
1552
       result list, and you may want to customize the font and/or font size.
1588
       The rest of the fonts used by Recoll are determined by your generic Qt
1553
       The rest of the fonts used by Recoll are determined by your generic Qt
1589
       config (try the qtconfig command).
1554
       config (try the qtconfig command).
1590
1555
1591
     * Edit result list paragraph format string: allows you to change the
1556
     o Edit result list paragraph format string: allows you to change the
1592
       presentation of each result list entry. See the result list
1557
       presentation of each result list entry. See the result list
1593
       customisation section.
1558
       customisation section.
1594
1559
1595
     * Edit result page HTML header insert: allows you to define text
1560
     o Edit result page HTML header insert: allows you to define text
1596
       inserted at the end of the result page HTML header. More detail in the
1561
       inserted at the end of the result page HTML header. More detail in the
1597
       result list customisation section.
1562
       result list customisation section.
1598
1563
1599
     * Date format: allows specifying the format used for displaying dates
1564
     o Date format: allows specifying the format used for displaying dates
1600
       inside the result list. This should be specified as an strftime()
1565
       inside the result list. This should be specified as an strftime()
1601
       string (man strftime).
1566
       string (man strftime).
1602
1567
1603
     * Abstract snippet separator: for synthetic abstracts built from index
1568
     o Abstract snippet separator: for synthetic abstracts built from index
1604
       data, which are usually made of several snippets from different parts
1569
       data, which are usually made of several snippets from different parts
1605
       of the document, this defines the snippet separator, an ellipsis by
1570
       of the document, this defines the snippet separator, an ellipsis by
1606
       default.
1571
       default.
1607
1572
1608
   Search parameters:
1573
   Search parameters: 
1609
1574
1610
     * Hide duplicate results: decides if result list entries are shown for
1575
     o Hide duplicate results: decides if result list entries are shown for
1611
       identical documents found in different places.
1576
       identical documents found in different places.
1612
1577
1613
     * Stemming language: stemming obviously depends on the document's
1578
     o Stemming language: stemming obviously depends on the document's
1614
       language. This listbox will let you chose among the stemming databases
1579
       language. This listbox will let you chose among the stemming databases
1615
       which were built during indexing (this is set in the main
1580
       which were built during indexing (this is set in the main
1616
       configuration file), or later added with recollindex -s (See the
1581
       configuration file), or later added with recollindex -s (See the
1617
       recollindex manual). Stemming languages which are dynamically added
1582
       recollindex manual). Stemming languages which are dynamically added
1618
       will be deleted at the next indexing pass unless they are also added
1583
       will be deleted at the next indexing pass unless they are also added
1619
       in the configuration file.
1584
       in the configuration file.
1620
1585
1621
     * Automatically add phrase to simple searches: a phrase will be
1586
     o Automatically add phrase to simple searches: a phrase will be
1622
       automatically built and added to simple searches when looking for Any
1587
       automatically built and added to simple searches when looking for Any
1623
       terms. This will give a relevance boost to the results where the
1588
       terms. This will give a relevance boost to the results where the
1624
       search terms appear as a phrase (consecutive and in order).
1589
       search terms appear as a phrase (consecutive and in order).
1625
1590
1626
     * Autophrase term frequency threshold percentage: very frequent terms
1591
     o Autophrase term frequency threshold percentage: very frequent terms
1627
       should not be included in automatic phrase searches for performance
1592
       should not be included in automatic phrase searches for performance
1628
       reasons. The parameter defines the cutoff percentage (percentage of
1593
       reasons. The parameter defines the cutoff percentage (percentage of
1629
       the documents where the term appears).
1594
       the documents where the term appears).
1630
1595
1631
     * Replace abstracts from documents: this decides if we should synthesize
1596
     o Replace abstracts from documents: this decides if we should synthesize
1632
       and display an abstract in place of an explicit abstract found within
1597
       and display an abstract in place of an explicit abstract found within
1633
       the document itself.
1598
       the document itself.
1634
1599
1635
     * Dynamically build abstracts: this decides if Recoll tries to build
1600
     o Dynamically build abstracts: this decides if Recoll tries to build
1636
       document abstracts (lists of snippets) when displaying the result
1601
       document abstracts (lists of snippets) when displaying the result
1637
       list. Abstracts are constructed by taking context from the document
1602
       list. Abstracts are constructed by taking context from the document
1638
       information, around the search terms.
1603
       information, around the search terms.
1639
1604
1640
     * Synthetic abstract size: adjust to taste...
1605
     o Synthetic abstract size: adjust to taste...
1641
1606
1642
     * Synthetic abstract context words: how many words should be displayed
1607
     o Synthetic abstract context words: how many words should be displayed
1643
       around each term occurrence.
1608
       around each term occurrence.
1644
1609
1645
     * Query language magic file name suffixes: a list of words which
1610
     o Query language magic file name suffixes: a list of words which
1646
       automatically get turned into ext:xxx file name suffix clauses when
1611
       automatically get turned into ext:xxx file name suffix clauses when
1647
       starting a query language query (ie: doc xls xlsx...). This will save
1612
       starting a query language query (ie: doc xls xlsx...). This will save
1648
       some typing for people who use file types a lot when querying.
1613
       some typing for people who use file types a lot when querying.
1649
1614
1650
   External indexes: This panel will let you browse for additional indexes
1615
   External indexes: This panel will let you browse for additional indexes
...
...
1660
   always implicitly active. If this is not desirable, you can set up your
1625
   always implicitly active. If this is not desirable, you can set up your
1661
   configuration so that it indexes, for example, an empty directory. An
1626
   configuration so that it indexes, for example, an empty directory. An
1662
   alternative indexer may also need to implement a way of purging the index
1627
   alternative indexer may also need to implement a way of purging the index
1663
   from stale data,
1628
   from stale data,
1664
1629
1665
     ----------------------------------------------------------------------
1666
1667
    3.1.11.1. The result list format
1630
    3.1.12.1. The result list format
1668
1631
1669
   The result list presentation can be exhaustively customized by adjusting
1632
   The result list presentation can be exhaustively customized by adjusting
1670
   two elements:
1633
   two elements:
1671
1634
1672
     * The paragraph format
1635
     o The paragraph format
1673
1636
1674
     * HTML code inside the header section
1637
     o HTML code inside the header section
1675
1638
1676
   These can be edited from the Result list tab of the GUI configuration.
1639
   These can be edited from the Result list tab of the GUI configuration.
1677
1640
1678
   Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
1641
   Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
1679
   (this may be disabled at build time), and total customisation is possible
1642
   (this may be disabled at build time), and total customisation is possible
...
...
1686
   WebKit build), if there are restrictions to what you can do, they are
1649
   WebKit build), if there are restrictions to what you can do, they are
1687
   beyond this author's HTML/CSS/Javascript abilities... There are a few
1650
   beyond this author's HTML/CSS/Javascript abilities... There are a few
1688
   examples on the page about customising the result list on the Recoll web
1651
   examples on the page about customising the result list on the Recoll web
1689
   site.
1652
   site.
1690
1653
1691
     ----------------------------------------------------------------------
1692
1693
      3.1.11.1.1. The paragraph format
1654
      The paragraph format
1694
1655
1695
   This is an arbitrary HTML string where the following printf-like %
1656
   This is an arbitrary HTML string where the following printf-like %
1696
   substitutions will be performed:
1657
   substitutions will be performed:
1697
1658
1698
     * %A. Abstract
1659
     o %A. Abstract
1699
1660
1700
     * %D. Date
1661
     o %D. Date
1701
1662
1702
     * %I. Icon image name. This is normally determined from the mime type.
1663
     o %I. Icon image name. This is normally determined from the mime type.
1703
       The associations are defined inside the mimeconf configuration file.
1664
       The associations are defined inside the mimeconf configuration file.
1704
       If a thumbnail for the file is found at the standard Freedesktop
1665
       If a thumbnail for the file is found at the standard Freedesktop
1705
       location, this will be displayed instead.
1666
       location, this will be displayed instead.
1706
1667
1707
     * %K. Keywords (if any)
1668
     o %K. Keywords (if any)
1708
1669
1709
     * %L. Precooked Preview, Edit, and possibly Snippets links
1670
     o %L. Precooked Preview, Edit, and possibly Snippets links
1710
1671
1711
     * %M. Mime type
1672
     o %M. Mime type
1712
1673
1713
     * %N. result Number inside the result page
1674
     o %N. result Number inside the result page
1714
1675
1715
     * %R. Relevance percentage
1676
     o %R. Relevance percentage
1716
1677
1717
     * %S. Size information
1678
     o %S. Size information
1718
1679
1719
     * %T. Title or Filename if not set.
1680
     o %T. Title or Filename if not set.
1720
1681
1721
     * %t. Title or Filename if not set.
1682
     o %t. Title or Filename if not set.
1722
1683
1723
     * %U. Url
1684
     o %U. Url
1724
1685
1725
   The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
1686
   The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
1726
   href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
1687
   href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
1727
   number inside the result page).
1688
   number inside the result page).
1728
1689
...
...
1763
   how they look.
1724
   how they look.
1764
1725
1765
   It is also possible to define the value of the snippet separator inside
1726
   It is also possible to define the value of the snippet separator inside
1766
   the abstract section.
1727
   the abstract section.
1767
1728
1768
     ----------------------------------------------------------------------
1769
1770
3.2. Searching with the KDE KIO slave
1729
3.2. Searching with the KDE KIO slave
1771
1730
1772
  3.2.1. What's this
1731
  3.2.1. What's this
1773
1732
1774
   The Recoll KIO slave allows performing a Recoll search by entering an
1733
   The Recoll KIO slave allows performing a Recoll search by entering an
...
...
1791
1750
1792
   The instructions for building this module are located in the source tree.
1751
   The instructions for building this module are located in the source tree.
1793
   See: kde/kio/recoll/00README.txt. Some Linux distributions do package the
1752
   See: kde/kio/recoll/00README.txt. Some Linux distributions do package the
1794
   kio-recoll module, so check before diving into the build process, maybe
1753
   kio-recoll module, so check before diving into the build process, maybe
1795
   it's already out there ready for one-click installation.
1754
   it's already out there ready for one-click installation.
1796
1797
     ----------------------------------------------------------------------
1798
1755
1799
  3.2.2. Searchable documents
1756
  3.2.2. Searchable documents
1800
1757
1801
   As a sample application, the Recoll KIO slave could allow preparing a set
1758
   As a sample application, the Recoll KIO slave could allow preparing a set
1802
   of HTML documents (for example a manual) so that they become their own
1759
   of HTML documents (for example a manual) so that they become their own
...
...
1815
     }
1772
     }
1816
 </script>
1773
 </script>
1817
  ....
1774
  ....
1818
 <body ondblclick="recollsearch()">
1775
 <body ondblclick="recollsearch()">
1819
1776
1820
     ----------------------------------------------------------------------
1821
1777
1822
3.3. Searching on the command line
1778
3.3. Searching on the command line
1823
1779
1824
   There are several ways to obtain search results as a text stream, without
1780
   There are several ways to obtain search results as a text stream, without
1825
   a graphical interface:
1781
   a graphical interface:
1826
1782
1827
     * By passing option -t to the recoll program.
1783
     o By passing option -t to the recoll program.
1828
1784
1829
     * By using the recollq program.
1785
     o By using the recollq program.
1830
1786
1831
     * By writing a custom Python program, using the Recoll Python API.
1787
     o By writing a custom Python program, using the Recoll Python API.
1832
1788
1833
   The first two methods work in the same way and accept/need the same
1789
   The first two methods work in the same way and accept/need the same
1834
   arguments (except for the additional -t to recoll). The query to be
1790
   arguments (except for the additional -t to recoll). The query to be
1835
   executed is specified as command line arguments.
1791
   executed is specified as command line arguments.
1836
1792
...
...
1884
 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html]      [comptes.html]  18593   bytes  
1840
 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html]      [comptes.html]  18593   bytes  
1885
 text/html       [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
1841
 text/html       [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
1886
 text/html       [file:///Users/uncrypted-dockes/projets/pagepers/index.html]    [psxtcl/writemime/recoll]...
1842
 text/html       [file:///Users/uncrypted-dockes/projets/pagepers/index.html]    [psxtcl/writemime/recoll]...
1887
 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
1843
 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
1888
1844
1889
     ----------------------------------------------------------------------
1890
1891
3.4. The query language
1845
3.4. The query language
1892
1846
1893
   The query language processor is activated in the GUI simple search entry
1847
   The query language processor is activated in the GUI simple search entry
1894
   when the search mode selector is set to Query Language. It can also be
1848
   when the search mode selector is set to Query Language. It can also be
1895
   used with the KIO slave or the command line search. It broadly has the
1849
   used with the KIO slave or the command line search. It broadly has the
...
...
1917
   An element is composed of an optional field specification, and a value,
1871
   An element is composed of an optional field specification, and a value,
1918
   separated by a colon (the field separator is the last colon in the
1872
   separated by a colon (the field separator is the last colon in the
1919
   element). Example: Eugenie, author:balzac, dc:title:grandet
1873
   element). Example: Eugenie, author:balzac, dc:title:grandet
1920
1874
1921
   The colon, if present, means "contains". Xesam defines other relations,
1875
   The colon, if present, means "contains". Xesam defines other relations,
1922
   which are mostly supported for now (except in special cases, described
1876
   which are mostly unsupported for now (except in special cases, described
1923
   further down).
1877
   further down).
1924
1878
1925
   All elements in the search entry are normally combined with an implicit
1879
   All elements in the search entry are normally combined with an implicit
1926
   AND. It is possible to specify that elements be OR'ed instead, as in
1880
   AND. It is possible to specify that elements be OR'ed instead, as in
1927
   Beatles OR Lennon. The OR must be entered literally (capitals), and it has
1881
   Beatles OR Lennon. The OR must be entered literally (capitals), and it has
...
...
1939
   Modifiers can be set on a phrase clause, for example to specify a
1893
   Modifiers can be set on a phrase clause, for example to specify a
1940
   proximity search (unordered). See the modifier section.
1894
   proximity search (unordered). See the modifier section.
1941
1895
1942
   Recoll currently manages the following default fields:
1896
   Recoll currently manages the following default fields:
1943
1897
1944
     * title, subject or caption are synonyms which specify data to be
1898
     o title, subject or caption are synonyms which specify data to be
1945
       searched for in the document title or subject.
1899
       searched for in the document title or subject.
1946
1900
1947
     * author or from for searching the documents originators.
1901
     o author or from for searching the documents originators.
1948
1902
1949
     * recipient or to for searching the documents recipients.
1903
     o recipient or to for searching the documents recipients.
1950
1904
1951
     * keyword for searching the document-specified keywords (few documents
1905
     o keyword for searching the document-specified keywords (few documents
1952
       actually have any).
1906
       actually have any).
1953
1907
1954
     * filename for the document's file name.
1908
     o filename for the document's file name.
1955
1909
1956
     * ext specifies the file name extension (Ex: ext:html)
1910
     o ext specifies the file name extension (Ex: ext:html)
1957
1911
1958
   The field syntax also supports a few field-like, but special, criteria:
1912
   The field syntax also supports a few field-like, but special, criteria:
1959
1913
1960
     * dir for filtering the results on file location (Ex:
1914
     o dir for filtering the results on file location (Ex:
1961
       dir:/home/me/somedir). -dir also works to find results not in the
1915
       dir:/home/me/somedir). -dir also works to find results not in the
1962
       specified directory (release >= 1.15.8). A tilde inside the value will
1916
       specified directory (release >= 1.15.8). A tilde inside the value will
1963
       be expanded to the home directory. Wildcards will not be expanded. You
1917
       be expanded to the home directory. Wildcards will not be expanded. You
1964
       cannot use OR with dir clauses (this restriction may go away in the
1918
       cannot use OR with dir clauses (this restriction may go away in the
1965
       future).
1919
       future).
...
...
1985
       and are best avoided.
1939
       and are best avoided.
1986
1940
1987
       You need to use double-quotes around the path value if it contains
1941
       You need to use double-quotes around the path value if it contains
1988
       space characters.
1942
       space characters.
1989
1943
1990
     * size for filtering the results on file size. Example: size<10000. You
1944
     o size for filtering the results on file size. Example: size<10000. You
1991
       can use <, > or = as operators. You can specify a range like the
1945
       can use <, > or = as operators. You can specify a range like the
1992
       following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
1946
       following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
1993
       used as (decimal) multipliers. Ex: size>1k to search for files bigger
1947
       used as (decimal) multipliers. Ex: size>1k to search for files bigger
1994
       than 1000 bytes.
1948
       than 1000 bytes.
1995
1949
1996
     * date for searching or filtering on dates. The syntax for the argument
1950
     o date for searching or filtering on dates. The syntax for the argument
1997
       is based on the ISO8601 standard for dates and time intervals. Only
1951
       is based on the ISO8601 standard for dates and time intervals. Only
1998
       dates are supported, no times. The general syntax is 2 elements
1952
       dates are supported, no times. The general syntax is 2 elements
1999
       separated by a / character. Each element can be a date or a period of
1953
       separated by a / character. Each element can be a date or a period of
2000
       time. Periods are specified as PnYnMnD. The n numbers are the
1954
       time. Periods are specified as PnYnMnD. The n numbers are the
2001
       respective numbers of years, months or days, any of which may be
1955
       respective numbers of years, months or days, any of which may be
2002
       missing. Dates are specified as YYYY-MM-DD. The days and months parts
1956
       missing. Dates are specified as YYYY-MM-DD. The days and months parts
2003
       may be missing. If the / is present but an element is missing, the
1957
       may be missing. If the / is present but an element is missing, the
2004
       missing element is interpreted as the lowest or highest date in the
1958
       missing element is interpreted as the lowest or highest date in the
2005
       index. Examples:
1959
       index. Examples:
2006
1960
2007
          * 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
1961
          o 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
2008
1962
2009
          * 2001-03-01/P1Y2M the same specified with a period.
1963
          o 2001-03-01/P1Y2M the same specified with a period.
2010
1964
2011
          * 2001/ from the beginning of 2001 to the latest date in the index.
1965
          o 2001/ from the beginning of 2001 to the latest date in the index.
2012
1966
2013
          * 2001 the whole year of 2001
1967
          o 2001 the whole year of 2001
2014
1968
2015
          * P2D/ means 2 days ago up to now if there are no documents with
1969
          o P2D/ means 2 days ago up to now if there are no documents with
2016
            dates in the future.
1970
            dates in the future.
2017
1971
2018
          * /2003 all documents from 2003 or older.
1972
          o /2003 all documents from 2003 or older.
2019
1973
2020
       Periods can also be specified with small letters (ie: p2y).
1974
       Periods can also be specified with small letters (ie: p2y).
2021
1975
2022
     * mime or format for specifying the mime type. This one is quite special
1976
     o mime or format for specifying the mime type. This one is quite special
2023
       because you can specify several values which will be OR'ed (the normal
1977
       because you can specify several values which will be OR'ed (the normal
2024
       default for the language is AND). Ex: mime:text/plain mime:text/html.
1978
       default for the language is AND). Ex: mime:text/plain mime:text/html.
2025
       Specifying an explicit boolean operator before a mime specification is
1979
       Specifying an explicit boolean operator before a mime specification is
2026
       not supported and will produce strange results. You can filter out
1980
       not supported and will produce strange results. You can filter out
2027
       certain types by using negation (-mime:some/type), and you can use
1981
       certain types by using negation (-mime:some/type), and you can use
2028
       wildcards in the value (mime:text/*). Note that mime is the ONLY field
1982
       wildcards in the value (mime:text/*). Note that mime is the ONLY field
2029
       with an OR default. You do need to use OR with ext terms for example.
1983
       with an OR default. You do need to use OR with ext terms for example.
2030
1984
2031
     * type or rclcat for specifying the category (as in
1985
     o type or rclcat for specifying the category (as in
2032
       text/media/presentation/etc.). The classification of mime types in
1986
       text/media/presentation/etc.). The classification of mime types in
2033
       categories is defined in the Recoll configuration (mimeconf), and can
1987
       categories is defined in the Recoll configuration (mimeconf), and can
2034
       be modified or extended. The default category names are those which
1988
       be modified or extended. The default category names are those which
2035
       permit filtering results in the main GUI screen. Categories are OR'ed
1989
       permit filtering results in the main GUI screen. Categories are OR'ed
2036
       like mime types above. This can't be negated with - either.
1990
       like mime types above. This can't be negated with - either.
...
...
2044
   The document filters used while indexing have the possibility to create
1998
   The document filters used while indexing have the possibility to create
2045
   other fields with arbitrary names, and aliases may be defined in the
1999
   other fields with arbitrary names, and aliases may be defined in the
2046
   configuration, so that the exact field search possibilities may be
2000
   configuration, so that the exact field search possibilities may be
2047
   different for you if someone took care of the customisation.
2001
   different for you if someone took care of the customisation.
2048
2002
2049
     ----------------------------------------------------------------------
2050
2051
  3.4.1. Modifiers
2003
  3.4.1. Modifiers
2052
2004
2053
   Some characters are recognized as search modifiers when found immediately
2005
   Some characters are recognized as search modifiers when found immediately
2054
   after the closing double quote of a phrase, as in "some
2006
   after the closing double quote of a phrase, as in "some
2055
   term"modifierchars. The actual "phrase" can be a single term of course.
2007
   term"modifierchars. The actual "phrase" can be a single term of course.
2056
   Supported modifiers:
2008
   Supported modifiers:
2057
2009
2058
     * l can be used to turn off stemming (mostly makes sense with p because
2010
     o l can be used to turn off stemming (mostly makes sense with p because
2059
       stemming is off by default for phrases).
2011
       stemming is off by default for phrases).
2060
2012
2061
     * o can be used to specify a "slack" for phrase and proximity searches:
2013
     o o can be used to specify a "slack" for phrase and proximity searches:
2062
       the number of additional terms that may be found between the specified
2014
       the number of additional terms that may be found between the specified
2063
       ones. If o is followed by an integer number, this is the slack, else
2015
       ones. If o is followed by an integer number, this is the slack, else
2064
       the default is 10.
2016
       the default is 10.
2065
2017
2066
     * p can be used to turn the default phrase search into a proximity one
2018
     o p can be used to turn the default phrase search into a proximity one
2067
       (unordered). Example:"order any in"p
2019
       (unordered). Example:"order any in"p
2068
2020
2069
     * C will turn on case sensitivity (if the index supports it).
2021
     o C will turn on case sensitivity (if the index supports it).
2070
2022
2071
     * D will turn on diacritics sensitivity (if the index supports it).
2023
     o D will turn on diacritics sensitivity (if the index supports it).
2072
2024
2073
     * A weight can be specified for a query element by specifying a decimal
2025
     o A weight can be specified for a query element by specifying a decimal
2074
       value at the start of the modifiers. Example: "Important"2.5.
2026
       value at the start of the modifiers. Example: "Important"2.5.
2075
2076
     ----------------------------------------------------------------------
2077
2027
2078
3.5. Search case and diacritics sensitivity
2028
3.5. Search case and diacritics sensitivity
2079
2029
2080
   For Recoll versions 1.18 and later, and when working with a raw index (not
2030
   For Recoll versions 1.18 and later, and when working with a raw index (not
2081
   the default), searches can be made sensitive to character case and
2031
   the default), searches can be made sensitive to character case and
...
...
2123
   will search for the term resume exactly (resume will not be a match).
2073
   will search for the term resume exactly (resume will not be a match).
2124
2074
2125
   When either case or diacritics sensitivity is activated, stem expansion is
2075
   When either case or diacritics sensitivity is activated, stem expansion is
2126
   turned off. Having both does not make much sense.
2076
   turned off. Having both does not make much sense.
2127
2077
2128
     ----------------------------------------------------------------------
2129
2130
3.6. Anchored searches and wildcards
2078
3.6. Anchored searches and wildcards
2131
2079
2132
   Some special characters are interpreted by Recoll in search strings to
2080
   Some special characters are interpreted by Recoll in search strings to
2133
   expand or specialize the search. Wildcards expand a root term in
2081
   expand or specialize the search. Wildcards expand a root term in
2134
   controlled ways. Anchor characters can restrict a search to succeed only
2082
   controlled ways. Anchor characters can restrict a search to succeed only
2135
   if the match is found at or near the beginning of the document or one of
2083
   if the match is found at or near the beginning of the document or one of
2136
   its fields.
2084
   its fields.
2137
2085
2138
     ----------------------------------------------------------------------
2139
2140
  3.6.1. More about wildcards
2086
  3.6.1. More about wildcards
2141
2087
2142
   All words entered in Recoll search fields will be processed for wildcard
2088
   All words entered in Recoll search fields will be processed for wildcard
2143
   expansion before the request is finally executed.
2089
   expansion before the request is finally executed.
2144
2090
2145
   The wildcard characters are:
2091
   The wildcard characters are:
2146
2092
2147
     * * which matches 0 or more characters.
2093
     o * which matches 0 or more characters.
2148
2094
2149
     * ? which matches a single character.
2095
     o ? which matches a single character.
2150
2096
2151
     * [] which allow defining sets of characters to be matched (ex: [abc]
2097
     o [] which allow defining sets of characters to be matched (ex: [abc]
2152
       matches a single character which may be 'a' or 'b' or 'c', [0-9]
2098
       matches a single character which may be 'a' or 'b' or 'c', [0-9]
2153
       matches any number.
2099
       matches any number.
2154
2100
2155
   You should be aware of a few things before using wildcards.
2101
   You should be aware of a few things before using wildcards.
2156
2102
2157
     * Using a wildcard character at the beginning of a word can make for a
2103
     o Using a wildcard character at the beginning of a word can make for a
2158
       slow search because Recoll will have to scan the whole index term list
2104
       slow search because Recoll will have to scan the whole index term list
2159
       to find the matches.
2105
       to find the matches.
2160
2106
2161
     * When working with a raw index (preserving character case and
2107
     o When working with a raw index (preserving character case and
2162
       diacritics), the literal part of a wildcard expression will be matched
2108
       diacritics), the literal part of a wildcard expression will be matched
2163
       exactly for case and diacritics.
2109
       exactly for case and diacritics.
2164
2110
2165
     * Using a * at the end of a word can produce more matches than you would
2111
     o Using a * at the end of a word can produce more matches than you would
2166
       think, and strange search results. You can use the term explorer tool
2112
       think, and strange search results. You can use the term explorer tool
2167
       to check what completions exist for a given term. You can also see
2113
       to check what completions exist for a given term. You can also see
2168
       exactly what search was performed by clicking on the link at the top
2114
       exactly what search was performed by clicking on the link at the top
2169
       of the result list. In general, for natural language terms, stem
2115
       of the result list. In general, for natural language terms, stem
2170
       expansion will produce better results than an ending * (stem expansion
2116
       expansion will produce better results than an ending * (stem expansion
2171
       is turned off when any wildcard character appears in the term).
2117
       is turned off when any wildcard character appears in the term).
2172
2118
2173
     ----------------------------------------------------------------------
2174
2175
  3.6.2. Anchored searches
2119
  3.6.2. Anchored searches
2176
2120
2177
   Two characters are used to specify that a search hit should occur at the
2121
   Two characters are used to specify that a search hit should occur at the
2178
   beginning or at the end of the text. ^ at the beginning of a term or
2122
   beginning or at the end of the text. ^ at the beginning of a term or
2179
   phrase constrains the search to happen at the start, $ at the end force it
2123
   phrase constrains the search to happen at the start, $ at the end force it
...
...
2199
   structured documents like scientific articles, in case explicit metadata
2143
   structured documents like scientific articles, in case explicit metadata
2200
   has not been supplied (a most frequent case), for example for looking for
2144
   has not been supplied (a most frequent case), for example for looking for
2201
   matches inside the abstract or the list of authors (which occur at the top
2145
   matches inside the abstract or the list of authors (which occur at the top
2202
   of the document).
2146
   of the document).
2203
2147
2204
     ----------------------------------------------------------------------
2205
2206
3.7. Desktop integration
2148
3.7. Desktop integration
2207
2149
2208
   Being independant of the desktop type has its drawbacks: Recoll desktop
2150
   Being independant of the desktop type has its drawbacks: Recoll desktop
2209
   integration is minimal. However there are a few tools available:
2151
   integration is minimal. However there are a few tools available:
2210
2152
2211
     * The KDE KIO Slave was described in a previous section.
2153
     o The KDE KIO Slave was described in a previous section.
2212
2154
2213
     * If you use a recent version of Ubuntu Linux, you may find the Ubuntu
2155
     o If you use a recent version of Ubuntu Linux, you may find the Ubuntu
2214
       Unity Lens module useful.
2156
       Unity Lens module useful.
2215
2157
2216
     * There is also an independantly developed Krunner plugin.
2158
     o There is also an independantly developed Krunner plugin.
2217
2159
2218
   Here follow a few other things that may help.
2160
   Here follow a few other things that may help.
2219
2220
     ----------------------------------------------------------------------
2221
2161
2222
  3.7.1. Hotkeying recoll
2162
  3.7.1. Hotkeying recoll
2223
2163
2224
   It is surprisingly convenient to be able to show or hide the Recoll GUI
2164
   It is surprisingly convenient to be able to show or hide the Recoll GUI
2225
   with a single keystroke. Recoll comes with a small Python script, based on
2165
   with a single keystroke. Recoll comes with a small Python script, based on
2226
   the libwnck window manager interface library, which will allow you to do
2166
   the libwnck window manager interface library, which will allow you to do
2227
   just this. The detailed instructions are on this wiki page.
2167
   just this. The detailed instructions are on this wiki page.
2228
2229
     ----------------------------------------------------------------------
2230
2168
2231
  3.7.2. The KDE Kicker Recoll applet
2169
  3.7.2. The KDE Kicker Recoll applet
2232
2170
2233
   This is probably obsolete now. Anyway:
2171
   This is probably obsolete now. Anyway:
2234
2172
...
...
2249
   query (in query language form), and an icon which can be used to restrict
2187
   query (in query language form), and an icon which can be used to restrict
2250
   the search to certain types of files. It is quite primitive, and launches
2188
   the search to certain types of files. It is quite primitive, and launches
2251
   a new recoll GUI instance every time (even if it is already running). You
2189
   a new recoll GUI instance every time (even if it is already running). You
2252
   may find it useful anyway.
2190
   may find it useful anyway.
2253
2191
2254
     ----------------------------------------------------------------------
2192
Chapter 4. Programming interface
2255
2256
                        Chapter 4. Programming interface
2257
2193
2258
   Recoll has an Application Programming Interface, usable both for indexing
2194
   Recoll has an Application Programming Interface, usable both for indexing
2259
   and searching, currently accessible from the Python language.
2195
   and searching, currently accessible from the Python language.
2260
2196
2261
   Another less radical way to extend the application is to write filters for
2197
   Another less radical way to extend the application is to write filters for
2262
   new types of documents.
2198
   new types of documents.
2263
2199
2264
   The processing of metadata attributes for documents (fields) is highly
2200
   The processing of metadata attributes for documents (fields) is highly
2265
   configurable.
2201
   configurable.
2266
2202
2267
     ----------------------------------------------------------------------
2268
2269
4.1. Writing a document filter
2203
4.1. Writing a document filter
2270
2204
2271
   Recoll filters are executable programs which translate from a specific
2205
   Recoll filters cooperate to translate from the multitude of input document
2272
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
2206
   formats, simple ones as opendocument, acrobat), or compound ones such as
2273
   format, which may be text/plain or text/html.
2207
   Zip or Email, into the final Recoll indexing input format, which may be
2208
   text/plain or text/html. Most filters are executable programs or scripts.
2209
   A few filters are coded in C++ and live inside recollindex. This latter
2210
   kind will not be described here.
2274
2211
2275
   As of Recoll 1.13, there are two kinds of filters:
2212
   There are currently (1.18 and since 1.13) two kinds of external executable
2213
   filters:
2276
2214
2277
     * Simple filters (the old ones) run once and exit. They can be bare
2215
     o Simple filters (exec filters) run once and exit. They can be bare
2278
       programs like antiword, or shell-scripts using other programs. They
2216
       programs like antiword, or scripts using other programs. They are very
2279
       are very simple to write, because they just need to output the
2217
       simple to write, because they just need to print the converted
2280
       converted to the standard output.
2218
       document to the standard output. Their output can be text/plain or
2219
       text/html.
2281
2220
2282
     * Multiple filters, new in 1.13, run as long as their master process
2221
     o Multiple filters (execm filters), run as long as their master process
2283
       (ie: recollindex) is active. They can process multiple files (sparing
2222
       (recollindex) is active. They can process multiple files (sparing the
2284
       the process startup time which can be very significant), or multiple
2223
       process startup time which can be very significant), or multiple
2285
       documents per file (ie: for zip or chm files). They communicate with
2224
       documents per file (e.g.: for zip or chm files). They communicate with
2286
       the indexer through a simple protocol, but are nevertheless a bit more
2225
       the indexer through a simple protocol, but are nevertheless a bit more
2287
       complicated than the older kind. Most of these new filters are written
2226
       complicated than the older kind. Most of new filters are written in
2288
       in Python, using a common module to handle the protocol.
2227
       Python, using a common module to handle the protocol. There is an
2228
       exception, rclimg which is written in Perl. The subdocuments output by
2229
       these filters can be directly indexable (text or HTML), or they can be
2230
       other simple or compound documents that will need to be processed by
2231
       another filter.
2289
2232
2290
   The following will just describe the simple filters. If you can program
2233
   In both cases, filters deal with regular file system files, and can
2291
   and want to write one of the other kind, it shouldn't be too difficult to
2234
   process either a single document, or a linear list of documents in each
2292
   make sense of one of the existing modules. For example, look at rclzip
2235
   file. Recoll is responsible for performing up to date checks, deal with
2293
   which uses Zip file paths as internal identifiers (ipath), and rclinfo,
2236
   more complex embedding and other upper level issues.
2294
   which uses an integer index.
2295
2237
2296
     ----------------------------------------------------------------------
2238
   In the extreme case of a simple filter returning a document in text/plain
2239
   format, no metadata can be transferred from the filter to the indexer.
2240
   Generic metadata, like document size or modification date, will be
2241
   gathered and stored by the indexer.
2242
2243
   Filters that produce text/html format can return an arbitrary amount of
2244
   metadata inside HTML meta tags. These will be processed according to the
2245
   directives found in the fields configuration file.
2246
2247
   The filters that can handle multiple documents per file return a single
2248
   piece of data to identify each document inside the file. This piece of
2249
   data, called an ipath element will be sent back by Recoll to extract the
2250
   document at query time, for previewing, or for creating a temporary file
2251
   to be opened by a viewer.
2252
2253
   The following section describes the simple filters, and the next one gives
2254
   a few explanations about the execm ones. You could conceivably write a
2255
   simple filter with only the elements in the manual. This will not be the
2256
   case for the other ones, for which you will have to look at the code.
2297
2257
2298
  4.1.1. Simple filters
2258
  4.1.1. Simple filters
2299
2259
2300
   Recoll simple filters are usually shell-scripts, but this is in no way
2260
   Recoll simple filters are usually shell-scripts, but this is in no way
2301
   necessary. Extracting the text from the native format is the difficult
2261
   necessary. Extracting the text from the native format is the difficult
...
...
2325
   You should look at one of the simple filters, for example rclps for a
2285
   You should look at one of the simple filters, for example rclps for a
2326
   starting point.
2286
   starting point.
2327
2287
2328
   Don't forget to make your filter executable before testing !
2288
   Don't forget to make your filter executable before testing !
2329
2289
2330
     ----------------------------------------------------------------------
2290
  4.1.2. "Multiple" filters
2331
2291
2292
   If you can program and want to write an execm filter, it should not be too
2293
   difficult to make sense of one of the existing modules. For example, look
2294
   at rclzip which uses Zip file paths as identifiers (ipath), and rclics,
2295
   which uses an integer index. Also have a look at the comments inside the
2296
   internfile/mh_execm.h file and possibly at the corresponding module.
2297
2298
   execm filters sometimes need to make a choice for the nature of the ipath
2299
   elements that they use in communication with the indexer. Here are a few
2300
   guidelines:
2301
2302
     o Use ASCII or UTF-8 (if the identifier is an integer print it, for
2303
       example, like printf %d would do).
2304
2305
     o If at all possible, the data should make some kind of sense when
2306
       printed to a log file to help with debugging.
2307
2308
     o Recoll uses a colon (:) as a separator to store a complex path
2309
       internally (for deeper embedding). Colons inside the ipath elements
2310
       output by a filter will be escaped, but would be a bad choice as a
2311
       filter-specific separator (mostly, again, for debugging issues).
2312
2313
   In any case, the main goal is that it should be easy for the filter to
2314
   extract the target document, given the file name and the ipath element.
2315
2316
   execm filters will also produce a document with a null ipath element.
2317
   Depending on the type of document, this may have some associated data
2318
   (e.g. the body of an email message), or none (typical for an archive
2319
   file). If it is empty, this document will be useful anyway for some
2320
   operations, as the parent of the actual data documents.
2321
2332
  4.1.2. Telling Recoll about the filter
2322
  4.1.3. Telling Recoll about the filter
2333
2323
2334
   There are two elements that link a file to the filter which should process
2324
   There are two elements that link a file to the filter which should process
2335
   it: the association of file to mime type and the association of a mime
2325
   it: the association of file to mime type and the association of a mime
2336
   type with a filter.
2326
   type with a filter.
2337
2327
...
...
2358
2348
2359
 application/x-chm = execm rclchm
2349
 application/x-chm = execm rclchm
2360
2350
2361
   The fragment specifies that:
2351
   The fragment specifies that:
2362
2352
2363
     * application/msword files are processed by executing the antiword
2353
     o application/msword files are processed by executing the antiword
2364
       program, which outputs text/plain encoded in utf-8.
2354
       program, which outputs text/plain encoded in utf-8.
2365
2355
2366
     * application/ogg files are processed by the rclogg script, with default
2356
     o application/ogg files are processed by the rclogg script, with default
2367
       output type (text/html, with encoding specified in the header, or
2357
       output type (text/html, with encoding specified in the header, or
2368
       utf-8 by default).
2358
       utf-8 by default).
2369
2359
2370
     * text/rtf is processed by unrtf, which outputs text/html. The
2360
     o text/rtf is processed by unrtf, which outputs text/html. The
2371
       iso-8859-1 encoding is specified because it is not the utf-8 default,
2361
       iso-8859-1 encoding is specified because it is not the utf-8 default,
2372
       and not output by unrtf in the HTML header section.
2362
       and not output by unrtf in the HTML header section.
2373
2363
2374
     * application/x-chm is processed by a persistant filter. This is
2364
     o application/x-chm is processed by a persistant filter. This is
2375
       determined by the execm keyword.
2365
       determined by the execm keyword.
2376
2366
2377
     ----------------------------------------------------------------------
2378
2379
  4.1.3. Filter HTML output
2367
  4.1.4. Filter HTML output
2380
2368
2381
   The output HTML could be very minimal like the following example:
2369
   The output HTML could be very minimal like the following example:
2382
2370
2383
 <html><head>
2371
 <html><head>
2384
 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
2372
 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
...
...
2405
 <meta name="somefield" content="Some textual data" />
2393
 <meta name="somefield" content="Some textual data" />
2406
2394
2407
   See the following section for details about configuring how field data is
2395
   See the following section for details about configuring how field data is
2408
   processed by the indexer.
2396
   processed by the indexer.
2409
2397
2410
     ----------------------------------------------------------------------
2411
2412
  4.1.4. Page numbers
2398
  4.1.5. Page numbers
2413
2399
2414
   The indexer will interpret ^L characters in the filter output as
2400
   The indexer will interpret ^L characters in the filter output as
2415
   indicating page breaks, and will record them. At query time, this allows
2401
   indicating page breaks, and will record them. At query time, this allows
2416
   starting a viewer on the right page for a hit or a snippet. Currently,
2402
   starting a viewer on the right page for a hit or a snippet. Currently,
2417
   only the PDF, Postscript and DVI filters generate page breaks.
2403
   only the PDF, Postscript and DVI filters generate page breaks.
2418
2404
2419
     ----------------------------------------------------------------------
2420
2421
4.2. Field data processing
2405
4.2. Field data processing
2422
2406
2423
   Fields are named pieces of information in or about documents, like title,
2407
   Fields are named pieces of information in or about documents, like title,
2424
   author, abstract.
2408
   author, abstract.
2425
2409
...
...
2433
   Recoll defines a number of default fields. Additional ones can be output
2417
   Recoll defines a number of default fields. Additional ones can be output
2434
   by filters, and described in the fields configuration file.
2418
   by filters, and described in the fields configuration file.
2435
2419
2436
   Fields can be:
2420
   Fields can be:
2437
2421
2438
     * indexed, meaning that their terms are separately stored in inverted
2422
     o indexed, meaning that their terms are separately stored in inverted
2439
       lists (with a specific prefix), and that a field-specific search is
2423
       lists (with a specific prefix), and that a field-specific search is
2440
       possible.
2424
       possible.
2441
2425
2442
     * stored, meaning that their value is recorded in the index data record
2426
     o stored, meaning that their value is recorded in the index data record
2443
       for the document, and can be returned and displayed with search
2427
       for the document, and can be returned and displayed with search
2444
       results.
2428
       results.
2445
2429
2446
   A field can be either or both indexed and stored. This and other aspects
2430
   A field can be either or both indexed and stored. This and other aspects
2447
   of fields handling is defined inside the fields configuration file.
2431
   of fields handling is defined inside the fields configuration file.
2448
2432
2449
   The sequence of events for field processing is as follows:
2433
   The sequence of events for field processing is as follows:
2450
2434
2451
     * During indexing, recollindex scans all meta fields in HTML documents
2435
     o During indexing, recollindex scans all meta fields in HTML documents
2452
       (most document types are transformed into HTML at some point). It
2436
       (most document types are transformed into HTML at some point). It
2453
       compares the name for each element to the configuration defining what
2437
       compares the name for each element to the configuration defining what
2454
       should be done with fields (the fields file)
2438
       should be done with fields (the fields file)
2455
2439
2456
     * If the name for the meta element matches one for a field that should
2440
     o If the name for the meta element matches one for a field that should
2457
       be indexed, the contents are processed and the terms are entered into
2441
       be indexed, the contents are processed and the terms are entered into
2458
       the index with the prefix defined in the fields file.
2442
       the index with the prefix defined in the fields file.
2459
2443
2460
     * If the name for the meta element matches one for a field that should
2444
     o If the name for the meta element matches one for a field that should
2461
       be stored, the content of the element is stored with the document data
2445
       be stored, the content of the element is stored with the document data
2462
       record, from which it can be extracted and displayed at query time.
2446
       record, from which it can be extracted and displayed at query time.
2463
2447
2464
     * At query time, if a field search is performed, the index prefix is
2448
     o At query time, if a field search is performed, the index prefix is
2465
       computed and the match is only performed against appropriately
2449
       computed and the match is only performed against appropriately
2466
       prefixed terms in the index.
2450
       prefixed terms in the index.
2467
2451
2468
     * At query time, the field can be displayed inside the result list by
2452
     o At query time, the field can be displayed inside the result list by
2469
       using the appropriate directive in the definition of the result list
2453
       using the appropriate directive in the definition of the result list
2470
       paragraph format. All fields are displayed on the fields screen of the
2454
       paragraph format. All fields are displayed on the fields screen of the
2471
       preview window (which you can reach through the right-click menu).
2455
       preview window (which you can reach through the right-click menu).
2472
       This is independant of the fact that the search which produced the
2456
       This is independant of the fact that the search which produced the
2473
       results used the field or not.
2457
       results used the field or not.
...
...
2476
   comments inside the file.
2460
   comments inside the file.
2477
2461
2478
   You can also have a look at the example on the Wiki, detailing how one
2462
   You can also have a look at the example on the Wiki, detailing how one
2479
   could add a page count field to pdf documents for displaying inside result
2463
   could add a page count field to pdf documents for displaying inside result
2480
   lists.
2464
   lists.
2481
2482
     ----------------------------------------------------------------------
2483
2465
2484
4.3. API
2466
4.3. API
2485
2467
2486
  4.3.1. Interface elements
2468
  4.3.1. Interface elements
2487
2469
...
...
2520
   is not used at all). The reason is that the main document indexer purge
2502
   is not used at all). The reason is that the main document indexer purge
2521
   pass would remove all the other indexer's documents, as they were not seen
2503
   pass would remove all the other indexer's documents, as they were not seen
2522
   during indexing. The main indexer documents would also probably be a
2504
   during indexing. The main indexer documents would also probably be a
2523
   problem for the external indexer purge operation.
2505
   problem for the external indexer purge operation.
2524
2506
2525
     ----------------------------------------------------------------------
2526
2527
  4.3.2. Python interface
2507
  4.3.2. Python interface
2528
2508
2529
    4.3.2.1. Introduction
2509
    4.3.2.1. Introduction
2530
2510
2531
   Recoll versions after 1.11 define a Python programming interface, both for
2511
   Recoll versions after 1.11 define a Python programming interface, both for
...
...
2549
   can then use to build and install the module:
2529
   can then use to build and install the module:
2550
2530
2551
   cd recoll-xxx/python/recoll
2531
   cd recoll-xxx/python/recoll
2552
   python setup.py build
2532
   python setup.py build
2553
   python setup.py install
2533
   python setup.py install
2554
2555
     ----------------------------------------------------------------------
2556
2534
2557
    4.3.2.2. Interface manual
2535
    4.3.2.2. Interface manual
2558
2536
2559
   NAME
2537
   NAME
2560
       recoll - This is an interface to the Recoll full text indexer.
2538
       recoll - This is an interface to the Recoll full text indexer.
...
...
2672
        |  
2650
        |  
2673
        |  Methods defined here:
2651
        |  Methods defined here:
2674
        |  
2652
        |  
2675
        |  
2653
        |  
2676
        |  execute(...)
2654
        |  execute(...)
2677
        |      execute(query_string, stemming=1|0)
2655
        |      execute(query_string, stemming=1|0, stemlang="stemming language")
2678
        |      
2656
        |      
2679
        |      Starts a search for query_string, a Recoll search language string
2657
        |      Starts a search for query_string, a Recoll search language string
2680
        |      (mostly Xesam-compatible).
2658
        |      (mostly Xesam-compatible).
2681
        |      The query can be a simple list of terms (and'ed by default), or more
2659
        |      The query can be a simple list of terms (and'ed by default), or more
2682
        |      complicated with field specs etc. See the Recoll manual.
2660
        |      complicated with field specs etc. See the Recoll manual.
...
...
2738
           confdir specifies a Recoll configuration directory
2716
           confdir specifies a Recoll configuration directory
2739
           (the default is built like for any Recoll program).
2717
           (the default is built like for any Recoll program).
2740
           extra_dbs is a list of external databases (xapian directories)
2718
           extra_dbs is a list of external databases (xapian directories)
2741
           writable decides if we can index new data through this connection
2719
           writable decides if we can index new data through this connection
2742
2720
2743
     ----------------------------------------------------------------------
2744
2745
    4.3.2.3. Example code
2721
    4.3.2.3. Example code
2746
2722
2747
   The following sample would query the index with a user language string.
2723
   The following sample would query the index with a user language string.
2748
   See the python/samples directory inside the Recoll source for other
2724
   See the python/samples directory inside the Recoll source for other
2749
   examples.
2725
   examples.
2750
2726
2751
 #!/usr/bin/env python
2727
 #!/usr/bin/env python
2728
2752
 import recoll
2729
 import recoll
2753
2730
2754
 db = recoll.connect()
2731
 db = recoll.connect()
2755
 db.setAbstractParams(maxchars=80, contextwords=2)
2732
 db.setAbstractParams(maxchars=80, contextwords=2)
2756
2733
...
...
2767
     abs = db.makeDocAbstract(doc, query).encode('utf-8')
2744
     abs = db.makeDocAbstract(doc, query).encode('utf-8')
2768
     print abs
2745
     print abs
2769
     print
2746
     print
2770
2747
2771
2748
2772
     ----------------------------------------------------------------------
2773
2749
2750
2774
                   Chapter 5. Installation and configuration
2751
Chapter 5. Installation and configuration
2775
2752
2776
5.1. Installing a binary copy
2753
5.1. Installing a binary copy
2777
2754
2778
   There are three types of binary Recoll installations:
2755
   There are three types of binary Recoll installations:
2779
2756
2780
     * Through your system normal software distribution framework (ie,
2757
     o Through your system normal software distribution framework (ie,
2781
       Debian/Ubuntu apt, FreeBSD ports, etc.).
2758
       Debian/Ubuntu apt, FreeBSD ports, etc.).
2782
2759
2783
     * From a package downloaded from the Recoll web site.
2760
     o From a package downloaded from the Recoll web site.
2784
2761
2785
     * From a prebuilt tree downloaded from the Recoll web site.
2762
     o From a prebuilt tree downloaded from the Recoll web site.
2786
2763
2787
   In all cases, the strict software dependancies (ie on Xapian or iconv)
2764
   In all cases, the strict software dependancies (ie on Xapian or iconv)
2788
   will be automatically satisfied, you should not have to worry about them.
2765
   will be automatically satisfied, you should not have to worry about them.
2789
2766
2790
   You will only have to check or install supporting applications for the
2767
   You will only have to check or install supporting applications for the
...
...
2793
2770
2794
   You should also maybe have a look at the configuration section (but this
2771
   You should also maybe have a look at the configuration section (but this
2795
   may not be necessary for a quick test with default parameters). Most
2772
   may not be necessary for a quick test with default parameters). Most
2796
   parameters can be more conveniently set from the GUI interface.
2773
   parameters can be more conveniently set from the GUI interface.
2797
2774
2798
     ----------------------------------------------------------------------
2799
2800
  5.1.1. Installing through a package system
2775
  5.1.1. Installing through a package system
2801
2776
2802
   If you use a BSD-type port system or a prebuilt package (DEB, RPM,
2777
   If you use a BSD-type port system or a prebuilt package (DEB, RPM,
2803
   manually or through the system software configuration utility), just
2778
   manually or through the system software configuration utility), just
2804
   follow the usual procedure for your system.
2779
   follow the usual procedure for your system.
2805
2780
2806
     ----------------------------------------------------------------------
2807
2808
  5.1.2. Installing a prebuilt Recoll
2781
  5.1.2. Installing a prebuilt Recoll
2809
2782
2810
   The unpackaged binary versions on the Recoll web site are just compressed
2783
   The unpackaged binary versions on the Recoll web site are just compressed
2811
   tar files of a build tree, where only the useful parts were kept
2784
   tar files of a build tree, where only the useful parts were kept
2812
   (executables and sample configuration).
2785
   (executables and sample configuration).
...
...
2815
   libiconv, to make installation easier (no dependencies).
2788
   libiconv, to make installation easier (no dependencies).
2816
2789
2817
   After extracting the tar file, you can proceed with installation as if you
2790
   After extracting the tar file, you can proceed with installation as if you
2818
   had built the package from source (that is, just type make install). The
2791
   had built the package from source (that is, just type make install). The
2819
   binary trees are built for installation to /usr/local.
2792
   binary trees are built for installation to /usr/local.
2820
2821
     ----------------------------------------------------------------------
2822
2793
2823
5.2. Supporting packages
2794
5.2. Supporting packages
2824
2795
2825
   Recoll uses external applications to index some file types. You need to
2796
   Recoll uses external applications to index some file types. You need to
2826
   install them for the file types that you wish to have indexed (these are
2797
   install them for the file types that you wish to have indexed (these are
...
...
2850
   by ad hoc filter code now use the xsltproc command, which usually comes
2821
   by ad hoc filter code now use the xsltproc command, which usually comes
2851
   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
2822
   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
2852
2823
2853
   Now for the list:
2824
   Now for the list:
2854
2825
2855
     * Openoffice files need unzip and xsltproc.
2826
     o Openoffice files need unzip and xsltproc.
2856
2827
2857
     * PDF files need pdftotext which is part of the Xpdf or Poppler
2828
     o PDF files need pdftotext which is part of the Xpdf or Poppler
2858
       packages.
2829
       packages.
2859
2830
2860
     * Postscript files need pstotext. The original version has an issue with
2831
     o Postscript files need pstotext. The original version has an issue with
2861
       shell character in file names, which is corrected in recent packages.
2832
       shell character in file names, which is corrected in recent packages.
2862
       See the the Recoll helper applications page for more detail.
2833
       See the the Recoll helper applications page for more detail.
2863
2834
2864
     * MS Word needs antiword. It is also useful to have wvWare installed as
2835
     o MS Word needs antiword. It is also useful to have wvWare installed as
2865
       it may be be used as a fallback for some files which antiword does not
2836
       it may be be used as a fallback for some files which antiword does not
2866
       handle.
2837
       handle.
2867
2838
2868
     * MS Excel and PowerPoint need catdoc.
2839
     o MS Excel and PowerPoint need catdoc.
2869
2840
2870
     * MS Open XML (docx) needs xsltproc.
2841
     o MS Open XML (docx) needs xsltproc.
2871
2842
2872
     * Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
2843
     o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
2873
       Ubuntu) package.
2844
       Ubuntu) package.
2874
2845
2875
     * RTF files need unrtf, which, in its standard version, has much trouble
2846
     o RTF files need unrtf, which, in its standard version, has much trouble
2876
       with non-western character sets. Check the Recoll helper applications
2847
       with non-western character sets. Check the Recoll helper applications
2877
       page.
2848
       page.
2878
2849
2879
     * TeX files need untex or detex. Check the Recoll helper applications
2850
     o TeX files need untex or detex. Check the Recoll helper applications
2880
       page for sources if it's not packaged for your distribution.
2851
       page for sources if it's not packaged for your distribution.
2881
2852
2882
     * dvi files need dvips.
2853
     o dvi files need dvips.
2883
2854
2884
     * djvu files need djvutxt and djvused from the DjVuLibre package.
2855
     o djvu files need djvutxt and djvused from the DjVuLibre package.
2885
2856
2886
     * Audio files: Recoll releases before 1.13 used the id3info command from
2857
     o Audio files: Recoll releases before 1.13 used the id3info command from
2887
       the id3lib package to extract mp3 tag information, metaflac (standard
2858
       the id3lib package to extract mp3 tag information, metaflac (standard
2888
       flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
2859
       flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
2889
       Releases 1.14 and later use a single Python filter based on mutagen
2860
       Releases 1.14 and later use a single Python filter based on mutagen
2890
       for all audio file types.
2861
       for all audio file types.
2891
2862
2892
     * Pictures: Recoll uses the Exiftool Perl package to extract tag
2863
     o Pictures: Recoll uses the Exiftool Perl package to extract tag
2893
       information. Most image file formats are supported. Note that there
2864
       information. Most image file formats are supported. Note that there
2894
       may not be much interest in indexing the technical tags (image size,
2865
       may not be much interest in indexing the technical tags (image size,
2895
       aperture, etc.). This is only of interest if you store personal tags
2866
       aperture, etc.). This is only of interest if you store personal tags
2896
       or textual descriptions inside the image files.
2867
       or textual descriptions inside the image files.
2897
2868
2898
     * chm: files in microsoft help format need Python and the pychm module
2869
     o chm: files in microsoft help format need Python and the pychm module
2899
       (which needs chmlib).
2870
       (which needs chmlib).
2900
2871
2901
     * ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
2872
     o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
2902
       module. icalendar is not needed for newer versions, which use internal
2873
       module. icalendar is not needed for newer versions, which use internal
2903
       code.
2874
       code.
2904
2875
2905
     * Zip archives need Python (and the standard zipfile module).
2876
     o Zip archives need Python (and the standard zipfile module).
2906
2877
2907
     * Rar archives need Python, the rarfile Python module and the unrar
2878
     o Rar archives need Python, the rarfile Python module and the unrar
2908
       utility.
2879
       utility.
2909
2880
2910
     * Midi karaoke files need Python and the Midi module
2881
     o Midi karaoke files need Python and the Midi module
2911
2882
2912
     * Konqueror webarchive format with Python (uses the Tarfile module).
2883
     o Konqueror webarchive format with Python (uses the Tarfile module).
2913
2884
2914
     * mimehtml web archive format (support based on the email filter, which
2885
     o mimehtml web archive format (support based on the email filter, which
2915
       introduces some mild weirdness, but still usable).
2886
       introduces some mild weirdness, but still usable).
2916
2887
2917
   Text, HTML, email folders, and Scribus files are processed internally. Lyx
2888
   Text, HTML, email folders, and Scribus files are processed internally. Lyx
2918
   is used to index Lyx files. Many filters need iconv and the standard sed
2889
   is used to index Lyx files. Many filters need iconv and the standard sed
2919
   and awk.
2890
   and awk.
2920
2891
2921
     ----------------------------------------------------------------------
2922
2923
5.3. Building from source
2892
5.3. Building from source
2924
2893
2925
  5.3.1. Prerequisites
2894
  5.3.1. Prerequisites
2926
2895
2927
   C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
2896
   C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
2928
   itself by strange messages about a missing iconv_open.
2897
   itself by strange messages about a missing iconv_open.
2929
2898
2930
   Development files for Xapian core.
2899
   Development files for Xapian core.
2931
2900
2901
  Important
2902
2932
     Important: If you are building Xapian for an older CPU (before Pentium 4
2903
   If you are building Xapian for an older CPU (before Pentium 4 or Athlon
2933
     or Athlon 64), you need to add the --disable-sse flag to the configure
2904
   64), you need to add the --disable-sse flag to the configure command. Else
2934
     command. Else all Xapian application will crash with an illegal
2905
   all Xapian application will crash with an illegal instruction error.
2935
     instruction error.
2936
2906
2937
   Development files for Qt .
2907
   Development files for Qt .
2938
2908
2939
   Development files for X11 and zlib.
2909
   Development files for X11 and zlib.
2940
2910
...
...
2945
   are using FreeBSD, there is a port).
2915
   are using FreeBSD, there is a port).
2946
2916
2947
   You may also need libiconv. Recoll currently uses version 1.9 (this should
2917
   You may also need libiconv. Recoll currently uses version 1.9 (this should
2948
   not be critical). On Linux systems, the iconv interface is part of libc
2918
   not be critical). On Linux systems, the iconv interface is part of libc
2949
   and you should not need to do anything special.
2919
   and you should not need to do anything special.
2950
2951
     ----------------------------------------------------------------------
2952
2920
2953
  5.3.2. Building
2921
  5.3.2. Building
2954
2922
2955
   Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
2923
   Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
2956
   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
2924
   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
...
...
2958
   very much welcome patches.
2926
   very much welcome patches.
2959
2927
2960
   Depending on the Qt 3 configuration on your system, you may have to set
2928
   Depending on the Qt 3 configuration on your system, you may have to set
2961
   the QTDIR and QMAKESPECS variables in your environment:
2929
   the QTDIR and QMAKESPECS variables in your environment:
2962
2930
2963
     * QTDIR should point to the directory above the one that holds the qt
2931
     o QTDIR should point to the directory above the one that holds the qt
2964
       include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
2932
       include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
2965
       be /usr/local/qt).
2933
       be /usr/local/qt).
2966
2934
2967
     * QMAKESPECS should be set to the name of one of the Qt mkspecs
2935
     o QMAKESPECS should be set to the name of one of the Qt mkspecs
2968
       sub-directories (ie: linux-g++).
2936
       sub-directories (ie: linux-g++).
2969
2937
2970
   On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
2938
   On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
2971
   is not needed because there is a default link in mkspecs/.
2939
   is not needed because there is a default link in mkspecs/.
2972
2940
2973
   Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration
2941
   Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration
2974
   details are entirely determined by qmake (which is quite often installed
2942
   details are entirely determined by qmake (which is quite often installed
2975
   as qmake-qt4).
2943
   as qmake-qt4).
2976
2944
2977
   Configure options:
2945
   Configure options: 
2978
2946
2979
     * --without-aspell will disable the code for phonetic matching of search
2947
     o --without-aspell will disable the code for phonetic matching of search
2980
       terms.
2948
       terms.
2981
2949
2982
     * --with-fam or --with-inotify will enable the code for real time
2950
     o --with-fam or --with-inotify will enable the code for real time
2983
       indexing. Inotify support is enabled by default on recent Linux
2951
       indexing. Inotify support is enabled by default on recent Linux
2984
       systems.
2952
       systems.
2985
2953
2986
     * --disable-webkit is available from version 1.17 to implement the
2954
     o --disable-webkit is available from version 1.17 to implement the
2987
       result list with a Qt QTextBrowser instead of a WebKit widget if you
2955
       result list with a Qt QTextBrowser instead of a WebKit widget if you
2988
       do not or can't depend on the latter.
2956
       do not or can't depend on the latter.
2989
2957
2990
     * --enable-xattr will enable code to fetch data from file extended
2958
     o --enable-xattr will enable code to fetch data from file extended
2991
       attributes. This is only useful is some application stores data in
2959
       attributes. This is only useful is some application stores data in
2992
       there, and also needs some simple configuration (see comments in the
2960
       there, and also needs some simple configuration (see comments in the
2993
       fields configuration file).
2961
       fields configuration file).
2994
2962
2995
     * --enable-camelcase will enable splitting camelCase words. This is not
2963
     o --enable-camelcase will enable splitting camelCase words. This is not
2996
       enabled by default as it has the unfortunate side-effect of making
2964
       enabled by default as it has the unfortunate side-effect of making
2997
       some phrase searches quite confusing: ie, "MySQL manual" would be
2965
       some phrase searches quite confusing: ie, "MySQL manual" would be
2998
       matched by "MySQL manual" and "my sql manual" but not "mysql manual"
2966
       matched by "MySQL manual" and "my sql manual" but not "mysql manual"
2999
       (only inside phrase searches).
2967
       (only inside phrase searches).
3000
2968
3001
     * --with-file-command Specify the version of the 'file' command to use
2969
     o --with-file-command Specify the version of the 'file' command to use
3002
       (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
2970
       (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
3003
       the gnu version on systems where the native one is bad.
2971
       the gnu version on systems where the native one is bad.
3004
2972
3005
     * --disable-qtgui Disable the Qt interface. Will allow building the
2973
     o --disable-qtgui Disable the Qt interface. Will allow building the
3006
       indexer and the command line search program in absence of a Qt
2974
       indexer and the command line search program in absence of a Qt
3007
       environment.
2975
       environment.
3008
2976
3009
     * --disable-x11mon Disable X11 connection monitoring inside recollindex.
2977
     o --disable-x11mon Disable X11 connection monitoring inside recollindex.
3010
       Together with --disable-qtgui, this allows building recoll without Qt
2978
       Together with --disable-qtgui, this allows building recoll without Qt
3011
       and X11.
2979
       and X11.
3012
2980
3013
     * Of course the usual autoconf configure options, like --prefix apply.
2981
     o Of course the usual autoconf configure options, like --prefix apply.
3014
2982
3015
   Normal procedure:
2983
   Normal procedure:
3016
2984
3017
         cd recoll-xxx
2985
         cd recoll-xxx
3018
         configure
2986
         configure
...
...
3023
   There is little auto-configuration. The configure script will mainly link
2991
   There is little auto-configuration. The configure script will mainly link
3024
   one of the system-specific files in the mk directory to mk/sysconf. If
2992
   one of the system-specific files in the mk directory to mk/sysconf. If
3025
   your system is not known yet, it will tell you as much, and you may want
2993
   your system is not known yet, it will tell you as much, and you may want
3026
   to manually copy and modify one of the existing files (the new file name
2994
   to manually copy and modify one of the existing files (the new file name
3027
   should be the output of uname -s).
2995
   should be the output of uname -s).
3028
3029
     ----------------------------------------------------------------------
3030
2996
3031
  5.3.3. Installation
2997
  5.3.3. Installation
3032
2998
3033
   Either type make install or execute recollinstall prefix, in the root of
2999
   Either type make install or execute recollinstall prefix, in the root of
3034
   the source tree. This will copy the commands to prefix/bin and the sample
3000
   the source tree. This will copy the commands to prefix/bin and the sample
...
...
3040
   RECOLL_DATADIR environment variable to indicate where the shared data is
3006
   RECOLL_DATADIR environment variable to indicate where the shared data is
3041
   to be found (ie for (ba)sh: export
3007
   to be found (ie for (ba)sh: export
3042
   RECOLL_DATADIR=/some/path/share/recoll).
3008
   RECOLL_DATADIR=/some/path/share/recoll).
3043
3009
3044
   You can then proceed to configuration.
3010
   You can then proceed to configuration.
3045
3046
     ----------------------------------------------------------------------
3047
3011
3048
5.4. Configuration overview
3012
5.4. Configuration overview
3049
3013
3050
   Most of the parameters specific to the recoll GUI are set through the
3014
   Most of the parameters specific to the recoll GUI are set through the
3051
   Preferences menu and stored in the standard Qt place
3015
   Preferences menu and stored in the standard Qt place
...
...
3096
         defaultcharset = utf-8
3060
         defaultcharset = utf-8
3097
        
3061
        
3098
3062
3099
   There are three kinds of lines:
3063
   There are three kinds of lines:
3100
3064
3101
     * Comment (starts with #) or empty.
3065
     o Comment (starts with #) or empty.
3102
3066
3103
     * Parameter affectation (name = value).
3067
     o Parameter affectation (name = value).
3104
3068
3105
     * Section definition ([somedirname]).
3069
     o Section definition ([somedirname]).
3106
3070
3107
   Depending on the type of configuration file, section definitions either
3071
   Depending on the type of configuration file, section definitions either
3108
   separate groups of parameters or allow redefining some parameters for a
3072
   separate groups of parameters or allow redefining some parameters for a
3109
   directory sub-tree. They stay in effect until another section definition,
3073
   directory sub-tree. They stay in effect until another section definition,
3110
   or the end of file, is encountered. Some of the parameters used for
3074
   or the end of file, is encountered. Some of the parameters used for
...
...
3119
   embedded spaces can be quoted using double-quotes.
3083
   embedded spaces can be quoted using double-quotes.
3120
3084
3121
   Encoding issues. Most of the configuration parameters are plain ASCII. Two
3085
   Encoding issues. Most of the configuration parameters are plain ASCII. Two
3122
   particular sets of values may cause encoding issues:
3086
   particular sets of values may cause encoding issues:
3123
3087
3124
     * File path parameters may contain non-ascii characters and should use
3088
     o File path parameters may contain non-ascii characters and should use
3125
       the exact same byte values as found in the file system directory.
3089
       the exact same byte values as found in the file system directory.
3126
       Usually, this means that the configuration file should use the system
3090
       Usually, this means that the configuration file should use the system
3127
       default locale encoding.
3091
       default locale encoding.
3128
3092
3129
     * The unac_except_trans parameter should be encoded in UTF-8. If your
3093
     o The unac_except_trans parameter should be encoded in UTF-8. If your
3130
       system locale is not UTF-8, and you need to also specify non-ascii
3094
       system locale is not UTF-8, and you need to also specify non-ascii
3131
       file paths, this poses a difficulty because common text editors cannot
3095
       file paths, this poses a difficulty because common text editors cannot
3132
       handle multiple encodings in a single file. In this relatively
3096
       handle multiple encodings in a single file. In this relatively
3133
       unlikely case, you can edit the configuration file as two separate
3097
       unlikely case, you can edit the configuration file as two separate
3134
       text files with appropriate encodings, and concatenate them to create
3098
       text files with appropriate encodings, and concatenate them to create
3135
       the complete configuration.
3099
       the complete configuration.
3136
3100
3137
     ----------------------------------------------------------------------
3138
3139
  5.4.1. Main configuration file
3101
  5.4.1. Main configuration file
3140
3102
3141
   recoll.conf is the main configuration file. It defines things like what to
3103
   recoll.conf is the main configuration file. It defines things like what to
3142
   index (top directories and things to ignore), and the default character
3104
   index (top directories and things to ignore), and the default character
3143
   set to use for document types which do not specify it internally.
3105
   set to use for document types which do not specify it internally.
...
...
3148
   start the initial indexing, which may take some time.
3110
   start the initial indexing, which may take some time.
3149
3111
3150
   Most of the following parameters can be changed from the Index
3112
   Most of the following parameters can be changed from the Index
3151
   Configuration menu in the recoll interface. Some can only be set by
3113
   Configuration menu in the recoll interface. Some can only be set by
3152
   editing the configuration file.
3114
   editing the configuration file.
3153
3154
     ----------------------------------------------------------------------
3155
3115
3156
    5.4.1.1. Parameters affecting what documents we index:
3116
    5.4.1.1. Parameters affecting what documents we index:
3157
3117
3158
   topdirs
3118
   topdirs
3159
3119
...
...
3202
           indexed at startup, but not monitored.
3162
           indexed at startup, but not monitored.
3203
3163
3204
           Example of use for skipping text files only in a specific
3164
           Example of use for skipping text files only in a specific
3205
           directory:
3165
           directory:
3206
3166
3207
 skippedPaths = ~/somedir/..txt
3167
 skippedPaths = ~/somedir/*.txt
3208
              
3168
              
3209
3169
3210
   skippedPathsFnmPathname
3170
   skippedPathsFnmPathname
3211
3171
3212
           The values in the *skippedPaths variables are matched by default
3172
           The values in the *skippedPaths variables are matched by default
...
...
3273
           determining the mime type for a file (the main procedure uses
3233
           determining the mime type for a file (the main procedure uses
3274
           suffix associations as defined in the mimemap file). This can be
3234
           suffix associations as defined in the mimemap file). This can be
3275
           useful for files with suffix-less names, but it will also cause
3235
           useful for files with suffix-less names, but it will also cause
3276
           the indexing of many bogus "text" files.
3236
           the indexing of many bogus "text" files.
3277
3237
3278
   processbeaglequeue
3238
   processwebqueue
3279
3239
3280
           If this is set, process the directory where Beagle Web browser
3240
           If this is set, process the directory where Web browser plugins
3281
           plugins copy visited pages for indexing. Of course, Beagle MUST
3241
           copy visited pages for indexing.
3282
           NOT be running, else things will behave strangely.
3283
3242
3284
   beaglequeuedir
3243
   webqueuedir
3285
3244
3286
           The path to the Beagle indexing queue. This is hard-coded in the
3245
           The path to the web indexing queue. This is hard-coded in the
3287
           Beagle plugin as ~/.beagle/ToIndex so there should be no need to
3246
           Firefox plugin as ~/.recollweb/ToIndex so there should be no need
3288
           change it.
3247
           to change it.
3289
3290
     ----------------------------------------------------------------------
3291
3248
3292
    5.4.1.2. Parameters affecting how we generate terms:
3249
    5.4.1.2. Parameters affecting how we generate terms:
3293
3250
3294
   Changing some of these parameters will imply a full reindex. Also, when
3251
   Changing some of these parameters will imply a full reindex. Also, when
3295
   using multiple indexes, it may not make sense to search indexes that don't
3252
   using multiple indexes, it may not make sense to search indexes that don't
...
...
3405
           are to be set, they should be separated with a colon (':')
3362
           are to be set, they should be separated with a colon (':')
3406
           character (which there is currently no way to escape). Ie:
3363
           character (which there is currently no way to escape). Ie:
3407
           localfields= rclaptg=gnus:other = val, then select specifier
3364
           localfields= rclaptg=gnus:other = val, then select specifier
3408
           viewer with mimetype|tag=... in mimeview.
3365
           viewer with mimetype|tag=... in mimeview.
3409
3366
3410
     ----------------------------------------------------------------------
3411
3412
    5.4.1.3. Parameters affecting where and how we store things:
3367
    5.4.1.3. Parameters affecting where and how we store things:
3413
3368
3414
   dbdir
3369
   dbdir
3415
3370
3416
           The name of the Xapian data directory. It will be created if
3371
           The name of the Xapian data directory. It will be created if
...
...
3442
           is really no sense in caching offsets for small files. The default
3397
           is really no sense in caching offsets for small files. The default
3443
           is 5 MB.
3398
           is 5 MB.
3444
3399
3445
   webcachedir
3400
   webcachedir
3446
3401
3447
           This is only used by the Beagle web browser plugin indexing code,
3402
           This is only used by the web browser plugin indexing code, and
3448
           and defines where the cache for visited pages will live. Default:
3403
           defines where the cache for visited pages will live. Default:
3449
           $RECOLL_CONFDIR/webcache
3404
           $RECOLL_CONFDIR/webcache
3450
3405
3451
   webcachemaxmbs
3406
   webcachemaxmbs
3452
3407
3453
           This is only used by the Beagle web browser plugin indexing code,
3408
           This is only used by the web browser plugin indexing code, and
3454
           and defines the maximum size for the web page cache. Default: 40
3409
           defines the maximum size for the web page cache. Default: 40 MB.
3455
           MB.
3456
3410
3457
   idxflushmb
3411
   idxflushmb
3458
3412
3459
           Threshold (megabytes of new text data) where we flush from memory
3413
           Threshold (megabytes of new text data) where we flush from memory
3460
           to disk index. Setting this can help control memory usage. A value
3414
           to disk index. Setting this can help control memory usage. A value
3461
           of 0 means no explicit flushing, letting Xapian use its own
3415
           of 0 means no explicit flushing, letting Xapian use its own
3462
           default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
3416
           default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
3463
           documents, which gives little memory usage control, as memory
3417
           documents, which gives little memory usage control, as memory
3464
           usage depends on average document size. The default value is 10.
3418
           usage also depends on average document size. The default value is
3465
3419
           10, and it is probably a bit low. If your system usually has free
3466
     ----------------------------------------------------------------------
3420
           memory, you can try higher values between 20 and 80. In my
3421
           experience, values beyond 100 are always counterproductive.
3467
3422
3468
    5.4.1.4. Miscellaneous parameters:
3423
    5.4.1.4. Miscellaneous parameters:
3469
3424
3470
   autodiacsens
3425
   autodiacsens
3471
3426
...
...
3575
           This allows definining location-related quirks for the mailbox
3530
           This allows definining location-related quirks for the mailbox
3576
           handler. Currently only the tbird flag is defined, and it should
3531
           handler. Currently only the tbird flag is defined, and it should
3577
           be set for directories which hold Thunderbird data, as their
3532
           be set for directories which hold Thunderbird data, as their
3578
           folder format is weird.
3533
           folder format is weird.
3579
3534
3580
     ----------------------------------------------------------------------
3581
3582
  5.4.2. The fields file
3535
  5.4.2. The fields file
3583
3536
3584
   This file contains information about dynamic fields handling in Recoll.
3537
   This file contains information about dynamic fields handling in Recoll.
3585
   Some very basic fields have hard-wired behaviour, and, mostly, you should
3538
   Some very basic fields have hard-wired behaviour, and, mostly, you should
3586
   not change the original data inside the fields file. But you can create
3539
   not change the original data inside the fields file. But you can create
...
...
3636
3589
3637
 [mail]
3590
 [mail]
3638
 # Extract the X-My-Tag mail header, and use it internally with the
3591
 # Extract the X-My-Tag mail header, and use it internally with the
3639
 # mailmytag field name
3592
 # mailmytag field name
3640
 x-my-tag = mailmytag
3593
 x-my-tag = mailmytag
3641
3642
     ----------------------------------------------------------------------
3643
3594
3644
  5.4.3. The mimemap file
3595
  5.4.3. The mimemap file
3645
3596
3646
   mimemap specifies the file name extension to mime type mappings.
3597
   mimemap specifies the file name extension to mime type mappings.
3647
3598
...
...
3663
   indexed (not even the file names are indexed for patterns in skippedNames.
3614
   indexed (not even the file names are indexed for patterns in skippedNames.
3664
   recoll_noindex is used mostly for things known to be unindexable by a
3615
   recoll_noindex is used mostly for things known to be unindexable by a
3665
   given Recoll version. Having it there avoids cluttering the more
3616
   given Recoll version. Having it there avoids cluttering the more
3666
   user-oriented and locally customized skippedNames.
3617
   user-oriented and locally customized skippedNames.
3667
3618
3668
     ----------------------------------------------------------------------
3669
3670
  5.4.4. The mimeconf file
3619
  5.4.4. The mimeconf file
3671
3620
3672
   mimeconf specifies how the different mime types are handled for indexing,
3621
   mimeconf specifies how the different mime types are handled for indexing,
3673
   and which icons are displayed in the recoll result lists.
3622
   and which icons are displayed in the recoll result lists.
3674
3623
...
...
3676
   except if you are a Recoll developer.
3625
   except if you are a Recoll developer.
3677
3626
3678
   The [icons] section allows you to change the icons which are displayed by
3627
   The [icons] section allows you to change the icons which are displayed by
3679
   recoll in the result lists (the values are the basenames of the png images
3628
   recoll in the result lists (the values are the basenames of the png images
3680
   inside the iconsdir directory (specified in recoll.conf).
3629
   inside the iconsdir directory (specified in recoll.conf).
3681
3682
     ----------------------------------------------------------------------
3683
3630
3684
  5.4.5. The mimeview file
3631
  5.4.5. The mimeview file
3685
3632
3686
   mimeview specifies which programs are started when you click on an Open
3633
   mimeview specifies which programs are started when you click on an Open
3687
   link in a result list. Ie: HTML is normally displayed using firefox, but
3634
   link in a result list. Ie: HTML is normally displayed using firefox, but
...
...
3719
   mydoc.doc.gz).
3666
   mydoc.doc.gz).
3720
3667
3721
   The right side of each assignment holds a command to be executed for
3668
   The right side of each assignment holds a command to be executed for
3722
   opening the file. The following substitutions are performed:
3669
   opening the file. The following substitutions are performed:
3723
3670
3724
     * %D. Document date
3671
     o %D. Document date
3725
3672
3726
     * %f. File name. This may be the name of a temporary file if it was
3673
     o %f. File name. This may be the name of a temporary file if it was
3727
       necessary to create one (ie: to extract a subdocument from a
3674
       necessary to create one (ie: to extract a subdocument from a
3728
       container).
3675
       container).
3729
3676
3730
     * %F. Original file name. Same as %f except if a temporary file is used.
3677
     o %F. Original file name. Same as %f except if a temporary file is used.
3731
3678
3732
     * %i. Internal path, for subdocuments of containers. The format depends
3679
     o %i. Internal path, for subdocuments of containers. The format depends
3733
       on the container type. If this appears in the command line, Recoll
3680
       on the container type. If this appears in the command line, Recoll
3734
       will not create a temporary file to extract the subdocument, expecting
3681
       will not create a temporary file to extract the subdocument, expecting
3735
       the called application (possibly a script) to be able to handle it.
3682
       the called application (possibly a script) to be able to handle it.
3736
3683
3737
     * %M. Mime type
3684
     o %M. Mime type
3738
3685
3739
     * %p. Page index. Only significant for a subset of document types,
3686
     o %p. Page index. Only significant for a subset of document types,
3740
       currently only PDF, Postscript and DVI files. Can be used to start the
3687
       currently only PDF, Postscript and DVI files. Can be used to start the
3741
       editor at the right page for a match or snippet.
3688
       editor at the right page for a match or snippet.
3742
3689
3743
     * %s. Search term. The value will only be set for documents with indexed
3690
     o %s. Search term. The value will only be set for documents with indexed
3744
       page numbers (ie: PDF). The value will be one of the matched search
3691
       page numbers (ie: PDF). The value will be one of the matched search
3745
       terms. It would allow pre-setting the value in the "Find" entry inside
3692
       terms. It would allow pre-setting the value in the "Find" entry inside
3746
       Evince for example, for easy highlighting of the term.
3693
       Evince for example, for easy highlighting of the term.
3747
3694
3748
     * %U, %u. Url.
3695
     o %U, %u. Url.
3749
3696
3750
   In addition to the predefined values above, all strings like %(fieldname)
3697
   In addition to the predefined values above, all strings like %(fieldname)
3751
   will be replaced by the value of the field named fieldname for the
3698
   will be replaced by the value of the field named fieldname for the
3752
   document. This could be used in combination with field customisation to
3699
   document. This could be used in combination with field customisation to
3753
   help with opening the document.
3700
   help with opening the document.
3754
3701
3755
     ----------------------------------------------------------------------
3756
3757
  5.4.6. Examples of configuration adjustments
3702
  5.4.6. Examples of configuration adjustments
3758
3703
3759
    5.4.6.1. Adding an external viewer for an non-indexed type
3704
    5.4.6.1. Adding an external viewer for an non-indexed type
3760
3705
3761
   Imagine that you have some kind of file which does not have indexable
3706
   Imagine that you have some kind of file which does not have indexable
...
...
3763
   the result list (when found by file name). The file names end in .blob and
3708
   the result list (when found by file name). The file names end in .blob and
3764
   can be displayed by application blobviewer.
3709
   can be displayed by application blobviewer.
3765
3710
3766
   You need two entries in the configuration files for this to work:
3711
   You need two entries in the configuration files for this to work:
3767
3712
3768
     * In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
3713
     o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
3769
       following line:
3714
       following line:
3770
3715
3771
 .blob = application/x-blobapp
3716
 .blob = application/x-blobapp
3772
3717
3773
       Note that the mime type is made up here, and you could call it
3718
       Note that the mime type is made up here, and you could call it
3774
       diesel/oil just the same.
3719
       diesel/oil just the same.
3720
3775
     * In $RECOLL_CONFDIR/mimeview under the [view] section, add:
3721
     o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
3776
3722
3777
 application/x-blobapp = blobviewer %f
3723
 application/x-blobapp = blobviewer %f
3778
3724
3779
       We are supposing that blobviewer wants a file name parameter here, you
3725
       We are supposing that blobviewer wants a file name parameter here, you
3780
       would use %u if it liked URLs better.
3726
       would use %u if it liked URLs better.
...
...
3783
   mime type which it already knows, you would just need to edit mimeview.
3729
   mime type which it already knows, you would just need to edit mimeview.
3784
   The entries you add in your personal file override those in the central
3730
   The entries you add in your personal file override those in the central
3785
   configuration, which you do not need to alter. mimeview can also be
3731
   configuration, which you do not need to alter. mimeview can also be
3786
   modified from the Gui.
3732
   modified from the Gui.
3787
3733
3788
     ----------------------------------------------------------------------
3789
3790
    5.4.6.2. Adding indexing support for a new file type
3734
    5.4.6.2. Adding indexing support for a new file type
3791
3735
3792
   Let us now imagine that the above .blob files actually contain indexable
3736
   Let us now imagine that the above .blob files actually contain indexable
3793
   text and that you know how to extract it with a command line program.
3737
   text and that you know how to extract it with a command line program.
3794
   Getting Recoll to index the files is easy. You need to perform the above
3738
   Getting Recoll to index the files is easy. You need to perform the above
3795
   alteration, and also to add data to the mimeconf file (typically in
3739
   alteration, and also to add data to the mimeconf file (typically in
3796
   ~/.recoll/mimeconf):
3740
   ~/.recoll/mimeconf):
3797
3741
3798
     * Under the [index] section, add the following line (more about the
3742
     o Under the [index] section, add the following line (more about the
3799
       rclblob indexing script later):
3743
       rclblob indexing script later):
3800
3744
3801
 application/x-blobapp = exec rclblob
3745
 application/x-blobapp = exec rclblob
3802
3746
3803
     * Under the [icons] section, you should choose an icon to be displayed
3747
     o Under the [icons] section, you should choose an icon to be displayed
3804
       for the files inside the result lists. Icons are normally 64x64 pixels
3748
       for the files inside the result lists. Icons are normally 64x64 pixels
3805
       PNG files which live in /usr/[local/]share/recoll/images.
3749
       PNG files which live in /usr/[local/]share/recoll/images.
3806
3750
3807
     * Under the [categories] section, you should add the mime type where it
3751
     o Under the [categories] section, you should add the mime type where it
3808
       makes sense (you can also create a category). Categories may be used
3752
       makes sense (you can also create a category). Categories may be used
3809
       for filtering in advanced search.
3753
       for filtering in advanced search.
3810
3754
3811
   The rclblob filter should be an executable program or script which exists
3755
   The rclblob filter should be an executable program or script which exists
3812
   inside /usr/[local/]share/recoll/filters. It will be given a file name as
3756
   inside /usr/[local/]share/recoll/filters. It will be given a file name as
3813
   argument and should output the text or html contents on the standard
3757
   argument and should output the text or html contents on the standard
3814
   output.
3758
   output.
3815
3759
3816
   The filter programming section describes in more detail how to write a
3760
   The filter programming section describes in more detail how to write a
3817
   filter.
3761
   filter.
3818
3819
     ----------------------------------------------------------------------