Switch to unified view

a/src/README b/src/README
...
...
30
30
31
   2. Indexing
31
   2. Indexing
32
32
33
                2.1. Introduction
33
                2.1. Introduction
34
34
35
                             2.1.1. Indexing modes
36
37
                             2.1.2. Configurations, multiple indexes
38
39
                             2.1.3. Document types
40
41
                             2.1.4. Recovery
42
35
                2.2. Index storage
43
                2.2. Index storage
36
44
37
                             2.2.1. Xapian index formats
45
                             2.2.1. Xapian index formats
38
46
39
                             2.2.2. Security aspects
47
                             2.2.2. Security aspects
...
...
103
                3.6. Desktop integration
111
                3.6. Desktop integration
104
112
105
                             3.6.1. Hotkeying recoll
113
                             3.6.1. Hotkeying recoll
106
114
107
                             3.6.2. The KDE Kicker Recoll applet
115
                             3.6.2. The KDE Kicker Recoll applet
116
117
                3.7. Multiple databases
108
118
109
   4. Programming interface
119
   4. Programming interface
110
120
111
                4.1. Writing a document filter
121
                4.1. Writing a document filter
112
122
...
...
286
   Indexing is the process by which the set of documents is analyzed and the
296
   Indexing is the process by which the set of documents is analyzed and the
287
   data entered into the database. Recoll indexing is normally incremental:
297
   data entered into the database. Recoll indexing is normally incremental:
288
   documents will only be processed if they have been modified. On the first
298
   documents will only be processed if they have been modified. On the first
289
   execution, all documents will need processing. A full index build can be
299
   execution, all documents will need processing. A full index build can be
290
   forced later by specifying an option to the indexing command (recollindex
300
   forced later by specifying an option to the indexing command (recollindex
291
   -z).
301
   -z or -Z).
292
302
303
   The following sections give an overview of different aspects of the
304
   indexing processes and configuration, with links to detailed sections.
305
306
     ----------------------------------------------------------------------
307
308
  2.1.1. Indexing modes
309
293
   Recoll indexing can be performed with two different methods:
310
   Recoll indexing can be performed along two different modes:
294
311
295
     * Periodic (or Batch) indexing: indexing takes place at discrete times,
312
     * Periodic (or batch) indexing: indexing takes place at discrete times,
296
       by executing the recollindex command. The typical usage is to have a
313
       by executing the recollindex command. The typical usage is to have a
297
       nightly indexing run programmed into your cron file.
314
       nightly indexing run programmed into your cron file.
298
315
299
     * Real time indexing: indexing takes place as soon as a file is created
316
     * Real time indexing: indexing takes place as soon as a file is created
300
       or changed. recollindex runs as a daemon and uses a file system
317
       or changed. recollindex runs as a daemon and uses a file system
...
...
305
   they can be combined by setting up multiple indexes (ie: use periodic
322
   they can be combined by setting up multiple indexes (ie: use periodic
306
   indexing on a big documentation directory, and real time indexing on a
323
   indexing on a big documentation directory, and real time indexing on a
307
   small home directory). Monitoring a big file system tree can consume
324
   small home directory). Monitoring a big file system tree can consume
308
   significant system resources.
325
   significant system resources.
309
326
327
     ----------------------------------------------------------------------
328
329
  2.1.2. Configurations, multiple indexes
330
331
   The parameters describing what is to be indexed and local preferences are
332
   defined in text files contained in a configuration directory.
333
334
   All parameters have defaults, defined in system-wide files.
335
336
   Without further configuration, Recoll will index all appropriate files
337
   from your home directory, with a reasonable set of defaults.
338
339
   A default personal configuration directory ($HOME/.recoll/) is created
340
   when a Recoll program is first executed. It is possible to create other
341
   configuration directories, and use them by setting the RECOLL_CONFDIR
342
   environment variable, or giving the -c option to any of the Recoll
343
   commands.
344
345
   In some cases, it may be interesting to index different areas of the file
346
   system to separate databases. You can do this by using multiple
347
   configuration directories, each indexing a file system area to a specific
348
   database. Typically, this would be done to separate personal and shared
349
   indexes, or to take advantage of the organization of your data to improve
350
   search precision.
351
352
   The generated indexes can be queried concurrently in a transparent manner.
353
354
   For index generation, multiple configurations are totally independant from
355
   each other. When multiple indexes are used for searches, some parameters
356
   should be consistent among the configurations.
357
358
     ----------------------------------------------------------------------
359
360
  2.1.3. Document types
361
310
   Recoll knows about quite a few different document types. The parameters
362
   Recoll knows about quite a few different document types. The parameters
311
   for document types recognition and processing are set in configuration
363
   for document types recognition and processing are set in configuration
312
   files.
364
   files.
313
365
314
   Most file types, like HTML or word processing files, only hold one
366
   Most file types, like HTML or word processing files, only hold one
315
   document. Some file types, like email folders or zip archives, can hold
367
   document. Some file types, like email folders or zip archives, can hold
316
   many individually indexed documents, which may in turn be themselves
368
   many individually indexed documents, which may themselves be compound
317
   compound ones. Such hierarchies can go quite deep, and Recoll can process,
369
   ones. Such hierarchies can go quite deep, and Recoll can process, for
318
   for example, an ms-word document stored as an attachment to an email
370
   example, an ms-word document stored as an attachment to an email message
319
   message inside an email folder archived in a zip file...
371
   inside an email folder archived in a zip file...
320
372
321
   Recoll indexing processes plain text, HTML, OpenDocument
373
   Recoll indexing processes plain text, HTML, OpenDocument
322
   (Open/LibreOffice), email formats, and a few others internally.
374
   (Open/LibreOffice), email formats, and a few others internally.
323
375
324
   Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
376
   Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
...
...
327
   would be needed for indexing existing files types. This list can be
379
   would be needed for indexing existing files types. This list can be
328
   displayed by selecting the menu option File->Show Missing Helpers in the
380
   displayed by selecting the menu option File->Show Missing Helpers in the
329
   recoll GUI. It is stored in the missing text file inside the configuration
381
   recoll GUI. It is stored in the missing text file inside the configuration
330
   directory.
382
   directory.
331
383
332
   Without further configuration, Recoll will index all appropriate files
384
     ----------------------------------------------------------------------
333
   from your home directory, with a reasonable set of defaults.
334
385
335
   In some cases, it may be interesting to index different areas of the file
386
  2.1.4. Recovery
336
   system to separate databases. You can do this by using multiple
337
   configuration directories, each indexing a file system area to a specific
338
   database. See the section about using multiple databases for more
339
   information on multiple configurations and indexes.
340
387
341
   In the rare case where the index becomes corrupted (which can signal
388
   In the rare case where the index becomes corrupted (which can signal
342
   itself by weird search results or crashes), the index files need to be
389
   itself by weird search results or crashes), the index files need to be
343
   erased before restarting a clean indexing pass. Just delete the xapiandb
390
   erased before restarting a clean indexing pass. Just delete the xapiandb
344
   directory (see next section), or, alternatively, start the next
391
   directory (see next section), or, alternatively, start the next
...
...
377
       configuration section). This method would mainly be of use if you
424
       configuration section). This method would mainly be of use if you
378
       wanted to keep the configuration directory in its default location,
425
       wanted to keep the configuration directory in its default location,
379
       but desired another location for the index, typically out of disk
426
       but desired another location for the index, typically out of disk
380
       occupation concerns.
427
       occupation concerns.
381
428
382
   The size of the index is determined by the document set size, but the
429
   The size of the index is determined by the size of the set of documents,
383
   ratio can vary a lot. For a typical mixed set of documents, the index size
430
   but the ratio can vary a lot. For a typical mixed set of documents, the
384
   will often be close to the data set size. In specific cases (a set of
431
   index size will often be close to the data set size. In specific cases (a
385
   compressed mbox files for example), the index can become much bigger than
432
   set of compressed mbox files for example), the index can become much
386
   the documents. It may also be much smaller if the documents contain a lot
433
   bigger than the documents. It may also be much smaller if the documents
387
   of images or other non-indexed data (an extreme example being a set of mp3
434
   contain a lot of images or other non-indexed data (an extreme example
388
   files where only the tags would be indexed).
435
   being a set of mp3 files where only the tags would be indexed).
389
436
390
   Of course, images, sound and video do not increase the index size, which
437
   Of course, images, sound and video do not increase the index size, which
391
   means that nowadays (2012), typically, even a big index will be negligible
438
   means that nowadays (2012), typically, even a big index will be negligible
392
   against the total amount of data on the computer.
439
   against the total amount of data on the computer.
393
440
...
...
407
   format to the newer one. If you want to upgrade to the new format, or if a
454
   format to the newer one. If you want to upgrade to the new format, or if a
408
   very old index needs to be converted because its format is not supported
455
   very old index needs to be converted because its format is not supported
409
   any more, you will have to explicitly delete the old index, then run a
456
   any more, you will have to explicitly delete the old index, then run a
410
   normal indexing process.
457
   normal indexing process.
411
458
412
   Unfortunately, using the -z option to recollindex is not sufficient to
459
   Using the -z option to recollindex is not sufficient to change the format,
413
   change the format, you will have to delete all files inside the index
460
   you will have to delete all files inside the index directory (typically
414
   directory (typically ~/.recoll/xapiandb) before starting the indexing.
461
   ~/.recoll/xapiandb) before starting the indexing.
415
462
416
     ----------------------------------------------------------------------
463
     ----------------------------------------------------------------------
417
464
418
  2.2.2. Security aspects
465
  2.2.2. Security aspects
419
466
...
...
437
484
438
   Variables set inside the Recoll configuration files control which areas of
485
   Variables set inside the Recoll configuration files control which areas of
439
   the file system are indexed, and how files are processed. These variables
486
   the file system are indexed, and how files are processed. These variables
440
   can be set either by editing the text files or using the dialogs in the
487
   can be set either by editing the text files or using the dialogs in the
441
   recoll GUI.
488
   recoll GUI.
442
443
   You can also use multiple indexes defined by separate configurations,
444
   typically to separate personal and shared indexes, or to take advantage of
445
   the organization of your data to improve search precision.
446
489
447
   The first time you start recoll, you will be asked whether or not you
490
   The first time you start recoll, you will be asked whether or not you
448
   would like it to build the index. If you want to adjust the configuration
491
   would like it to build the index. If you want to adjust the configuration
449
   before indexing, just click Cancel at this point, which will get you into
492
   before indexing, just click Cancel at this point, which will get you into
450
   the configuration interface. If you exit at this point, recoll will have
493
   the configuration interface. If you exit at this point, recoll will have
...
...
457
   most immediately useful variable you may interested in is probably
500
   most immediately useful variable you may interested in is probably
458
   topdirs, which determines what subtrees get indexed.
501
   topdirs, which determines what subtrees get indexed.
459
502
460
   The applications needed to index file types other than text, HTML or email
503
   The applications needed to index file types other than text, HTML or email
461
   (ie: pdf, postscript, ms-word...) are described in the external packages
504
   (ie: pdf, postscript, ms-word...) are described in the external packages
462
   section
505
   section.
463
506
464
     ----------------------------------------------------------------------
507
     ----------------------------------------------------------------------
465
508
466
  2.3.1. The indexing configuration GUI
509
  2.3.1. The indexing configuration GUI
467
510
...
...
544
   because some operations which are normally performed at the end of the
587
   because some operations which are normally performed at the end of the
545
   indexing pass will have been skipped (for example, the stemming and
588
   indexing pass will have been skipped (for example, the stemming and
546
   spelling databases will be inexistant or out of date). You just need to
589
   spelling databases will be inexistant or out of date). You just need to
547
   restart indexing at a later time to restore consistency. The indexing will
590
   restart indexing at a later time to restore consistency. The indexing will
548
   restart at the interruption point (the full file tree will be traversed,
591
   restart at the interruption point (the full file tree will be traversed,
549
   but files that were indexed up to the interruption and are still up to
592
   but files that were indexed up to the interruption and for which the index
550
   date will not need to be reindexed).
593
   is still up to date will not need to be reindexed).
551
594
552
   recollindex has a number of other options which are described in its man
595
   recollindex has a number of other options which are described in its man
553
   page.
596
   page. Only a few will be described here.
554
597
598
   Option -z will reset the index when starting. This is almost the same as
599
   destroying the index files (the nuance is that the Xapian format version
600
   will not be changed).
601
602
   Option -Z will force the update of all documents without resetting the
603
   index first. This will not have the "clean start" aspect of -z, but the
604
   advantage is that the index will remain available for querying while it is
605
   rebuilt, which can be a significant advantage if it is very big (some
606
   installations need days for a full index rebuild).
607
555
   Of special interest maybe are the -i and -f options. -i allows indexing an
608
   Of special interest also, maybe, are the -i and -f options. -i allows
556
   explicit list of files (given as command line parameters or read on
609
   indexing an explicit list of files (given as command line parameters or
557
   stdin). -f tells recollindex to ignore file selection parameters from the
610
   read on stdin). -f tells recollindex to ignore file selection parameters
558
   configuration. Together, these options allow building a custom file
611
   from the configuration. Together, these options allow building a custom
559
   selection process for some area of the file system, by adding the top
612
   file selection process for some area of the file system, by adding the top
560
   directory to the skippedPaths list and using an appropriate file selection
613
   directory to the skippedPaths list and using an appropriate file selection
561
   method to build the file list to be fed to recollindex -if .
614
   method to build the file list to be fed to recollindex -if. Trivial
615
   example:
562
616
563
   recollindex -i will not descend into directory parameters, but just add
617
            find . -name indexable.txt -print | recollindex -if
564
   them as index entries. It is up to the external file selection method to
618
          
565
   build the complete file list.
619
620
   recollindex -i will not descend into subdirectories specified as
621
   parameters, but just add them as index entries. It is up to the external
622
   file selection method to build the complete file list.
566
623
567
     ----------------------------------------------------------------------
624
     ----------------------------------------------------------------------
568
625
569
  2.5.2. Using cron to automate indexing
626
  2.5.2. Using cron to automate indexing
570
627
...
...
640
   quite big, depending on the log level.
697
   quite big, depending on the log level.
641
698
642
   When building Recoll, the real time indexing support can be customised
699
   When building Recoll, the real time indexing support can be customised
643
   during package configuration with the --with[out]-fam or
700
   during package configuration with the --with[out]-fam or
644
   --with[out]-inotify options. The default is currently to include inotify
701
   --with[out]-inotify options. The default is currently to include inotify
645
   monitoring on systems that support it, and, as of recoll 1.17, gamin
702
   monitoring on systems that support it, and, as of Recoll 1.17, gamin
646
   support on FreeBSD.
703
   support on FreeBSD.
647
704
648
   While it is convenient that data is indexed in real time, repeated
705
   While it is convenient that data is indexed in real time, repeated
649
   indexing can generate a significant load on the system when files such as
706
   indexing can generate a significant load on the system when files such as
650
   email folders change. Also, monitoring large file trees by itself
707
   email folders change. Also, monitoring large file trees by itself
...
...
771
   punctuation, newlines and all - except for wildcard characters (single ?
828
   punctuation, newlines and all - except for wildcard characters (single ?
772
   characters are ok). Recoll will process it and produce a meaningful
829
   characters are ok). Recoll will process it and produce a meaningful
773
   search. This is what most differentiates this mode from the Query Language
830
   search. This is what most differentiates this mode from the Query Language
774
   mode, where you have to care about the syntax.
831
   mode, where you have to care about the syntax.
775
832
776
   You can use the Tools / Advanced search dialog for more complex searches.
833
   You can use the Tools->Advanced search dialog for more complex searches.
777
834
778
     ----------------------------------------------------------------------
835
     ----------------------------------------------------------------------
779
836
780
  3.1.2. The default result list
837
  3.1.2. The default result list
781
838
...
...
922
979
923
   You can display successive or previous documents from the result list
980
   You can display successive or previous documents from the result list
924
   inside a preview tab by typing Shift+Down or Shift+Up (Down and Up are the
981
   inside a preview tab by typing Shift+Down or Shift+Up (Down and Up are the
925
   arrow keys).
982
   arrow keys).
926
983
927
   The preview tabs have an internal incremental search function. You
928
   initiate the search either by typing a / (slash) or CTL-F inside the text
929
   area or by clicking into the Search for: text field and entering the
930
   search string. You can then use the Next and Previous buttons to find the
931
   next/previous occurrence. You can also type F3 inside the text area to get
932
   to the next occurrence.
933
934
   If you have a search string entered and you use Ctrl-Up/Ctrl-Down to
935
   browse the results, the search is initiated for each successive document.
936
   If the string is found, the cursor will be positioned at the first
937
   occurrence of the search string.
938
939
   A right-click menu in the text area allows switching between displaying
984
   A right-click menu in the text area allows switching between displaying
940
   the main text or the contents of fields associated to the document (ie:
985
   the main text or the contents of fields associated to the document (ie:
941
   author, abtract, etc.). This is especially useful in cases where the term
986
   author, abtract, etc.). This is especially useful in cases where the term
942
   match did not occur in the main text but in one of the fields.
987
   match did not occur in the main text but in one of the fields. In the case
988
   of images, you can switch between three displays: the image itself, the
989
   image metadata as extracted by exiftool and the fields, which is the
990
   metadata stored in the index.
943
991
944
   You can print the current preview window contents by typing Ctrl-P (Ctrl +
992
   You can print the current preview window contents by typing Ctrl-P (Ctrl +
945
   P) in the window text.
993
   P) in the window text.
994
995
     ----------------------------------------------------------------------
996
997
    3.1.4.1. Searching inside the preview
998
999
   The preview window has an internal search capability, mostly controlled by
1000
   the panel at the bottom of the window, which works in two modes: as a
1001
   classical editor incremental search, where we look for the text entered in
1002
   the entry zone, or as a way to walk the matches between the document and
1003
   the Recoll query that found it.
1004
1005
   Incremental text search
1006
1007
           The preview tabs have an internal incremental search function. You
1008
           initiate the search either by typing a / (slash) or CTL-F inside
1009
           the text area or by clicking into the Search for: text field and
1010
           entering the search string. You can then use the Next and Previous
1011
           buttons to find the next/previous occurrence. You can also type F3
1012
           inside the text area to get to the next occurrence.
1013
1014
           If you have a search string entered and you use Ctrl-Up/Ctrl-Down
1015
           to browse the results, the search is initiated for each successive
1016
           document. If the string is found, the cursor will be positioned at
1017
           the first occurrence of the search string.
1018
1019
   Walking the match lists
1020
1021
           If the entry area is empty when you click the Next or Previous
1022
           buttons, the editor will be scrolled to show the next match to any
1023
           search term (the next highlighted zone). If you select a search
1024
           group from the dropdown list and click Next or Previous, the match
1025
           list for this group will be walked. This is not the same as a text
1026
           search, because the occurences will include non-exact matches (as
1027
           caused by stemming or wildcards). The search will revert to the
1028
           text mode as soon as you edit the entry area.
946
1029
947
     ----------------------------------------------------------------------
1030
     ----------------------------------------------------------------------
948
1031
949
  3.1.5. Complex/advanced search
1032
  3.1.5. Complex/advanced search
950
1033
...
...
1102
1185
1103
     ----------------------------------------------------------------------
1186
     ----------------------------------------------------------------------
1104
1187
1105
  3.1.7. Multiple databases
1188
  3.1.7. Multiple databases
1106
1189
1107
   Multiple Recoll databases or indexes can be created by using several
1190
   See the section describing the use of multiple indexes for generalities.
1108
   configuration directories which are usually set to index different areas
1191
   Only the aspects concerning the recoll GUI are described here.
1109
   of the file system. A specific index can be selected for updating or
1110
   searching, using the RECOLL_CONFDIR environment variable or the -c option
1111
   to recoll and recollindex.
1112
1192
1113
   A recollindex program instance can only update one specific index.
1114
1115
   A recoll program instance is also associated with a specific index, which
1193
   A recoll program instance is always associated with a specific index,
1116
   is the one to be updated by its indexing thread, but it can use any number
1194
   which is the one to be updated when requested from the File menu, but it
1117
   of Recoll indexes for searching. The external indexes can be selected
1195
   can use any number of Recoll indexes for searching. The external indexes
1118
   through the external indexes tab in the preferences dialog.
1196
   can be selected through the external indexes tab in the preferences
1197
   dialog.
1119
1198
1120
   Index selection is performed in two phases. A set of all usable indexes
1199
   Index selection is performed in two phases. A set of all usable indexes
1121
   must first be defined, and then the subset of indexes to be used for
1200
   must first be defined, and then the subset of indexes to be used for
1122
   searching. Of course, these parameters are retained across program
1201
   searching. Of course, these parameters are retained across program
1123
   executions (there are kept separately for each Recoll configuration). The
1202
   executions (there are kept separately for each Recoll configuration). The
...
...
1134
   system administrator so that every user does not have to do it. The
1213
   system administrator so that every user does not have to do it. The
1135
   variable should define a colon-separated list of index directories, ie:
1214
   variable should define a colon-separated list of index directories, ie:
1136
1215
1137
 export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
1216
 export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
1138
1217
1139
   A typical usage scenario for the multiple index feature would be for a
1218
   Another environment variable, RECOLL_ACTIVE_EXTRA_DBS allows adding to the
1140
   system administrator to set up a central index for shared data, that you
1219
   active list of indexes. This variable was suggested and implemented by a
1141
   choose to search or not in addition to your personal data. Of course,
1220
   Recoll user. It is mostly useful if you use scripts to mount external
1142
   there are other possibilities. There are many cases where you know the
1221
   volumes with Recoll indexes. By using RECOLL_EXTRA_DBS and
1143
   subset of files that should be searched, and where narrowing the search
1222
   RECOLL_ACTIVE_EXTRA_DBS, you can add and activate the index for the
1144
   can improve the results. You can achieve approximately the same effect
1223
   mounted volume when starting recoll.
1145
   with the directory filter in advanced search, but multiple indexes will
1224
1146
   have much better performance and may be worth the trouble.
1225
   RECOLL_ACTIVE_EXTRA_DBS is available for Recoll versions 1.17.2 and later.
1226
   A change was made in the same update so that recoll will automatically
1227
   deactivate unreachable indexes when starting up.
1147
1228
1148
     ----------------------------------------------------------------------
1229
     ----------------------------------------------------------------------
1149
1230
1150
  3.1.8. Document history
1231
  3.1.8. Document history
1151
1232
...
...
1530
   The default value for the paragraph format string is:
1611
   The default value for the paragraph format string is:
1531
1612
1532
 <img src="%I" align="left">%R %S %L &nbsp;&nbsp;<b>%T</b><br>
1613
 <img src="%I" align="left">%R %S %L &nbsp;&nbsp;<b>%T</b><br>
1533
 %M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i>&nbsp;%i<br>
1614
 %M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i>&nbsp;%i<br>
1534
 %A %K
1615
 %A %K
1535
        
1536
1616
1537
   You may, for example, try the following for a more web-like experience:
1617
   You may, for example, try the following for a more web-like experience:
1538
1618
1539
 <u><b><a href="P%N">%T</a></b></u><br>
1619
 <u><b><a href="P%N">%T</a></b></u><br>
1540
 %A<font color=#008000>%U - %S</font> - %L
1620
 %A<font color=#008000>%U - %S</font> - %L
1541
        
1542
1621
1622
   Note that the P%N link in the above paragraph makes the title a preview
1543
   Or the clean looking:
1623
   link. Or the clean looking:
1544
1624
1545
 <img src="%I" align="left">%L <font color="#900000">%R</font>
1625
 <img src="%I" align="left">%L <font color="#900000">%R</font>
1546
   <b>%T</b><br>%S
1626
 &nbsp;&nbsp;<b>%T&</b><br>%S&nbsp;
1547
 <font color="#808080"><i>%U</i></font>
1627
 <font color="#808080"><i>%U</i></font>
1548
 <table bgcolor="#e0e0e0">
1628
 <table bgcolor="#e0e0e0">
1549
 <tr><td><div>%A</div></td></tr>
1629
 <tr><td><div>%A</div></td></tr>
1550
 </table>%K
1630
 </table>%K
1551
        
1552
1553
   Note that the P%N link in the above paragraph makes the title a preview
1554
   link.
1555
1631
1556
   These samples, and some others are on the web site, with pictures to show
1632
   These samples, and some others are on the web site, with pictures to show
1557
   how they look.
1633
   how they look.
1558
1634
1559
   It is also possible to define the value of the snippet separator inside
1635
   It is also possible to define the value of the snippet separator inside
...
...
1691
1767
1692
   The language is roughly based on the (seemingly defunct) Xesam user search
1768
   The language is roughly based on the (seemingly defunct) Xesam user search
1693
   language specification.
1769
   language specification.
1694
1770
1695
   If the results of a query language search puzzle you and you doubt what
1771
   If the results of a query language search puzzle you and you doubt what
1696
   has been actually searched for, you can use the GUI show query link at the
1772
   has been actually searched for, you can use the GUI Show Query link at the
1697
   top of the result list to check the exact query which was finally executed
1773
   top of the result list to check the exact query which was finally executed
1698
   by Xapian.
1774
   by Xapian.
1699
1775
1700
   Here follows a sample request that we are going to explain:
1776
   Here follows a sample request that we are going to explain:
1701
1777
...
...
1945
   a new recoll GUI instance every time (even if it is already running). You
2021
   a new recoll GUI instance every time (even if it is already running). You
1946
   may find it useful anyway.
2022
   may find it useful anyway.
1947
2023
1948
     ----------------------------------------------------------------------
2024
     ----------------------------------------------------------------------
1949
2025
2026
3.7. Multiple databases
2027
2028
   Multiple Recoll databases or indexes can be created by using several
2029
   configuration directories which are usually set to index different areas
2030
   of the file system. A specific index can be selected for updating or
2031
   searching, using the RECOLL_CONFDIR environment variable or the -c option
2032
   to recoll and recollindex.
2033
2034
   A typical usage scenario for the multiple index feature would be for a
2035
   system administrator to set up a central index for shared data, that you
2036
   choose to search or not in addition to your personal data. Of course,
2037
   there are other possibilities. There are many cases where you know the
2038
   subset of files that should be searched, and where narrowing the search
2039
   can improve the results. You can achieve approximately the same effect
2040
   with the directory filter in advanced search, but multiple indexes will
2041
   have much better performance and may be worth the trouble.
2042
2043
   A recollindex program instance can only update one specific index.
2044
2045
   The main index (defined by RECOLL_CONFDIR or -c) is always active. If this
2046
   is undesirable, you can set up your base configuration to index an empty
2047
   directory.
2048
2049
   The different search interfaces (GUI, command line, ...) have different
2050
   methods to define the set of indexes to be used, see the appropriate
2051
   section.
2052
2053
   If a set of multiple indexes are to be used together for searches, some
2054
   configuration parameters must be consistent among the set. These are
2055
   parameters which need to be the same when indexing and searching. As the
2056
   parameters come from the main configuration when searching, they need to
2057
   be compatible with what was set when creating the other indexes (which
2058
   came from their respective configuration directories. Most of the relevant
2059
   parameters are described in the following linked section.
2060
2061
     ----------------------------------------------------------------------
2062
1950
                        Chapter 4. Programming interface
2063
                        Chapter 4. Programming interface
1951
2064
1952
   Recoll has an Application programming Interface, usable both for indexing
2065
   Recoll has an Application programming Interface, usable both for indexing
1953
   and searching, currently accessible from the Python language.
2066
   and searching, currently accessible from the Python language.
1954
2067
...
...
2014
   the filter if the operation is for indexing or previewing. Some filters
2127
   the filter if the operation is for indexing or previewing. Some filters
2015
   use this to output a slightly different format, for example stripping
2128
   use this to output a slightly different format, for example stripping
2016
   uninteresting repeated keywords (ie: Subject: for email) when indexing.
2129
   uninteresting repeated keywords (ie: Subject: for email) when indexing.
2017
   This is not essential.
2130
   This is not essential.
2018
2131
2019
   You should look to one of the simple filters, for example rclps for a
2132
   You should look at one of the simple filters, for example rclps for a
2020
   starting point.
2133
   starting point.
2021
2134
2022
   Don't forget to make your filter executable before testing !
2135
   Don't forget to make your filter executable before testing !
2023
2136
2024
     ----------------------------------------------------------------------
2137
     ----------------------------------------------------------------------
...
...
2617
2730
2618
     * QTDIR should point to the directory above the one that holds the qt
2731
     * QTDIR should point to the directory above the one that holds the qt
2619
       include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
2732
       include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
2620
       be /usr/local/qt).
2733
       be /usr/local/qt).
2621
2734
2622
     * QMAKESPECS should be set to the name of one of the qt mkspecs
2735
     * QMAKESPECS should be set to the name of one of the Qt mkspecs
2623
       sub-directories (ie: linux-g++).
2736
       sub-directories (ie: linux-g++).
2624
2737
2625
   On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
2738
   On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
2626
   is not needed because there is a default link in mkspecs/.
2739
   is not needed because there is a default link in mkspecs/.
2627
2740
...
...
2983
   defaultcharset
3096
   defaultcharset
2984
3097
2985
           The name of the character set used for files that do not contain a
3098
           The name of the character set used for files that do not contain a
2986
           character set definition (ie: plain text files). This can be
3099
           character set definition (ie: plain text files). This can be
2987
           redefined for any sub-directory. If it is not set at all, the
3100
           redefined for any sub-directory. If it is not set at all, the
2988
           character set used is the one defined by the nls environment
3101
           character set used is the one defined by the nls environment (
2989
           (LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
3102
           LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
2990
3103
2991
   unac_except_trans
3104
   unac_except_trans
2992
3105
2993
           This is a list of characters, encoded in UTF-8, which should be
3106
           This is a list of characters, encoded in UTF-8, which should be
2994
           handled specially when converting text to unaccented lowercase.
3107
           handled specially when converting text to unaccented lowercase.