Switch to unified view

a/src/README b/src/README
...
...
10
10
11
   Copyright (c) 2005 Jean-Francois Dockes
11
   Copyright (c) 2005 Jean-Francois Dockes
12
12
13
   This document introduces full text search notions and describes the
13
   This document introduces full text search notions and describes the
14
   installation and use of the Recoll application. It currently describes
14
   installation and use of the Recoll application. It currently describes
15
   Recoll 1.12.
15
   Recoll 1.12-1.13.
16
17
   [ Split HTML / Single HTML ]
16
18
17
     ----------------------------------------------------------------------
19
     ----------------------------------------------------------------------
18
20
19
   Table of Contents
21
   Table of Contents
20
22
...
...
38
40
39
                2.3. Indexing configuration
41
                2.3. Indexing configuration
40
42
41
                             2.3.1. The indexing configuration GUI
43
                             2.3.1. The indexing configuration GUI
42
44
45
                2.4. Using Beagle WEB browser plugins
46
43
                2.4. Periodic indexing
47
                2.5. Periodic indexing
44
48
45
                             2.4.1. Starting indexing
49
                             2.5.1. Starting indexing
46
50
47
                             2.4.2. Using cron to automate indexing
51
                             2.5.2. Using cron to automate indexing
48
52
49
                2.5. Real time indexing
53
                2.6. Real time indexing
50
54
51
   3. Searching with the Qt graphical user interface
55
   3. Searching with the Qt graphical user interface
52
56
53
                3.1. Simple search
57
                3.1. Simple search
54
58
...
...
80
84
81
                             3.11.3. Others
85
                             3.11.3. Others
82
86
83
                3.12. Customizing the search interface
87
                3.12. Customizing the search interface
84
88
89
                             3.12.1. The result list paragraph format
90
85
   4. Searching with the KDE KIO slave
91
   4. Searching with the KDE KIO slave
86
92
87
                4.1. What's this
93
                4.1. What's this
88
94
89
                4.2. Searchable documents
95
                4.2. Searchable documents
...
...
104
110
105
                             6.3.2. Python interface
111
                             6.3.2. Python interface
106
112
107
   7. Installation
113
   7. Installation
108
114
109
                7.1. Installing a prebuilt copy
115
                7.1. Installing a binary copy
110
116
111
                             7.1.1. Installing through a package system
117
                             7.1.1. Installing through a package system
112
118
113
                             7.1.2. Installing a prebuilt Recoll
119
                             7.1.2. Installing a prebuilt Recoll
114
120
...
...
271
   
277
   
272
278
273
   Recoll knows about quite a few different document types. The parameters
279
   Recoll knows about quite a few different document types. The parameters
274
   for document types recognition and processing are set in configuration
280
   for document types recognition and processing are set in configuration
275
   files Most file types, like HTML or word processing files, only hold one
281
   files Most file types, like HTML or word processing files, only hold one
276
   document. Some file types, like mail folder files can hold many
282
   document. Some file types, like mail folder files, can hold many
277
   individually indexed documents.
283
   individually indexed documents.
278
284
279
   Recoll indexing processes plain text, HTML, openoffice and e-mail files
285
   Recoll indexing processes plain text, HTML, openoffice and e-mail files
280
   internally.
286
   internally (a few more actually).
281
287
282
   Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
288
   Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
283
   applications for preprocessing. The list is in the installation section.
289
   applications for preprocessing. The list is in the installation section.
284
   After every indexing operation, Recoll updates a list of commands that
290
   After every indexing operation, Recoll updates a list of commands that
285
   would be needed for indexing existing files types. This list can be
291
   would be needed for indexing existing files types. This list can be
...
...
293
   system to separate databases. You can do this by using multiple
299
   system to separate databases. You can do this by using multiple
294
   configuration directories, each indexing a file system area to a specific
300
   configuration directories, each indexing a file system area to a specific
295
   database. See the section about using multiple databases for more
301
   database. See the section about using multiple databases for more
296
   information on multiple configurations and indexes.
302
   information on multiple configurations and indexes.
297
303
304
   In the rare case where the index becomes corrupted (which can signal
305
   itself by weird search results or crashes), the index files need to be
306
   erased before restarting a clean indexing pass. Just delete the xapiandb
307
   directory (see next section), or, alternatively, start the next
308
   recollindex with the -z option, which will reset the database before
309
   indexing.
310
298
     ----------------------------------------------------------------------
311
     ----------------------------------------------------------------------
299
312
300
2.2. Index storage
313
2.2. Index storage
301
314
302
   The default location for the index data is the xapiandb subdirectory of
315
   The default location for the index data is the xapiandb subdirectory of
...
...
327
       configuration section). This method would mainly be of use if you
340
       configuration section). This method would mainly be of use if you
328
       wanted to keep the configuration directory in its default location,
341
       wanted to keep the configuration directory in its default location,
329
       but desired another location for the index, typically out of disk
342
       but desired another location for the index, typically out of disk
330
       occupation concerns.
343
       occupation concerns.
331
344
332
   The size of the index is determined by the size of the set of documents,
345
   The size of the index is determined by the document set size, but the
333
   but the ratio can vary a lot. For a typical mixed set of documents, the
346
   ratio can vary a lot. For a typical mixed set of documents, the index size
334
   index size will often be close to the data set size. In specific cases (a
347
   will often be close to the data set size. In specific cases (a set of
335
   set of compressed mbox files for example), the index can become much
348
   compressed mbox files for example), the index can become much bigger than
336
   bigger than the documents. It may also be much smaller if the documents
349
   the documents. It may also be much smaller if the documents contain a lot
337
   contain a lot of images or other non-indexed data (an extreme example
350
   of images or other non-indexed data (an extreme example being a set of mp3
338
   being a set of mp3 files where only the tags would be indexed).
351
   files where only the tags would be indexed).
339
352
340
   Of course, images, sound and video do not increase the index size, which
353
   Of course, images, sound and video do not increase the index size, which
341
   means that it will be quite typical nowadays (2006), that even a big index
354
   means that it will be quite typical nowadays (2006), that even a big index
342
   will be negligible against the total amount of data on the computer.
355
   will be negligible against the total amount of data on the computer.
343
356
...
...
403
   You can also use multiple indexes defined by separate configurations,
416
   You can also use multiple indexes defined by separate configurations,
404
   typically to separate personal and shared indexes, or to take advantage of
417
   typically to separate personal and shared indexes, or to take advantage of
405
   the organization of your data to improve search precision.
418
   the organization of your data to improve search precision.
406
419
407
   The first time you start recoll, you will be asked whether or not you
420
   The first time you start recoll, you will be asked whether or not you
408
   would like recoll to build the index. If you want to adjust the
421
   would like it to build the index. If you want to adjust the configuration
409
   configuration before indexing, just click Cancel at this point. That way,
422
   before indexing, just click Cancel at this point, which will get you into
410
   recoll will have created a ~/.recoll directory containing empty
423
   the configuration interface. If you exit, recoll will have created a
411
   configuration files.
424
   ~/.recoll directory containing empty configuration files, which you can
425
   edit by hand.
412
426
413
   The configuration is documented inside the installation chapter of this
427
   The configuration is documented inside the installation chapter of this
414
   document, or in the recoll.conf(5) man page, but the most current
428
   document, or in the recoll.conf(5) man page, but the most current
415
   information will most likely be the comments inside the sample file. The
429
   information will most likely be the comments inside the sample file. The
416
   most immediately useful variable you may interested in is probably
430
   most immediately useful variable you may interested in is probably
...
...
445
   use it on hand-edited files, which you might nevertheless want to backup
459
   use it on hand-edited files, which you might nevertheless want to backup
446
   first...
460
   first...
447
461
448
     ----------------------------------------------------------------------
462
     ----------------------------------------------------------------------
449
463
464
2.4. Using Beagle WEB browser plugins
465
466
   Beagle is a concurrent desktop indexer, built on Lucene and the Mono
467
   project (C#), for which a number of add-on browser plugins were written.
468
   These work by copying visited web pages to an indexing queue directory,
469
   which the indexer then processes.
470
471
   If, for any reason, you so happen to prefer Recoll to Beagle, you can
472
   still use the browser plugins (they are written in Javascript and
473
   completely independant of C#, Beagle, Lucene...). Recoll can process the
474
   Beagle queue directory. Of course, this supposes that Beagle is not
475
   running, else both programs will fight for the same files.
476
477
   This feature can be enabled in the GUI indexing configuration panel, or by
478
   editing the configuration file (set processbeaglequeue to 1).
479
480
     ----------------------------------------------------------------------
481
450
2.4. Periodic indexing
482
2.5. Periodic indexing
451
483
452
  2.4.1. Starting indexing
484
  2.5.1. Starting indexing
453
485
454
   Indexing is performed either by the recollindex program, or by the
486
   Indexing is performed either by the recollindex program, or by the
455
   indexing thread inside the recoll program (use the File menu). Both
487
   indexing thread inside the recoll program (use the File menu). Both
456
   programs will use the RECOLL_CONFDIR variable or accept a -c confdir
488
   programs will use the RECOLL_CONFDIR variable or accept a -c confdir
457
   option to specify a non-default configuration directory.
489
   option to specify a non-default configuration directory.
458
490
459
   If the recoll program finds no index when it starts, it will automatically
491
   If the recoll program finds no index when it starts, it will automatically
460
   start indexing (except if canceled).
492
   start indexing (except if canceled).
461
493
462
   It is best to avoid interrupting the indexing process, as this may
494
   The indexing process can be interrupted by sending an interrupt (^C,
463
   sometimes leave the index in a bad state. This is not a serious problem,
495
   SIGINT) or terminate (SIGTERM) signal. Some time may elapse before the
464
   as you then just need to delete the index files and restart the indexing.
496
   process exits, because it needs to properly flush and close the index. The
465
   The index files are normally stored in the $HOME/.recoll/xapiandb
497
   indexing will restart at the interruption point the next time (the full
466
   directory, which you can just delete if needed. Alternatively, you can
498
   file tree will still be traversed, but files that were indexed up to the
467
   start recollindex with option -z, which will reset the database before
499
   interruption and are still up to date will not need to be reindexed).
468
   indexing.
469
500
470
     ----------------------------------------------------------------------
501
   After such an interruption, the index will be somewhat inconsistent
502
   because some operations which are normally performed at the end of the
503
   indexing pass will have been skipped (for exemple, the stemming and
504
   spelling databases will be inexistant or out of date). You just need to
505
   restart indexing at a later time to restore consistency.
471
506
507
     ----------------------------------------------------------------------
508
472
  2.4.2. Using cron to automate indexing
509
  2.5.2. Using cron to automate indexing
473
510
474
   The most common way to set up indexing is to have a cron task execute it
511
   The most common way to set up indexing is to have a cron task execute it
475
   every night. For example the following crontab entry would do it every day
512
   every night. For example the following crontab entry would do it every day
476
   at 3:30AM (supposing recollindex is in your PATH):
513
   at 3:30AM (supposing recollindex is in your PATH):
477
514
478
 30 3 * * * recollindex > /tmp/recolltrace 2>&1
515
 30 3 * * * recollindex > /some/tmp/dir/recolltrace 2>&1
516
517
   Or, using anacron:
518
519
 1  15  su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"
479
520
480
   The usual command to edit your crontab is crontab -e (which will usually
521
   The usual command to edit your crontab is crontab -e (which will usually
481
   start the vi editor to edit the file). You may have more sophisticated
522
   start the vi editor to edit the file). You may have more sophisticated
482
   tools available on your system.
523
   tools available on your system.
483
524
484
     ----------------------------------------------------------------------
525
     ----------------------------------------------------------------------
485
526
486
2.5. Real time indexing
527
2.6. Real time indexing
487
528
488
   Real time monitoring/indexing is performed by starting the recollindex -m
529
   Real time monitoring/indexing is performed by starting the recollindex -m
489
   command. With this option, recollindex will detach from the terminal and
530
   command. With this option, recollindex will detach from the terminal and
490
   become a daemon, permanently monitoring file changes and updating the
531
   become a daemon, permanently monitoring file changes and updating the
491
   index.
532
   index.
...
...
511
552
512
   The indexing daemon gets started, then the window manager, for which the
553
   The indexing daemon gets started, then the window manager, for which the
513
   session waits.
554
   session waits.
514
555
515
   By default the indexing daemon will monitor the state of the X11 session,
556
   By default the indexing daemon will monitor the state of the X11 session,
516
   and exit when it finishes, it is not necessary to kill it explicitly.
557
   and exit when it finishes, it is not necessary to kill it explicitly. (The
517
   (The X11 server monitoring can be disabled with option -x to recollindex).
558
   X11 server monitoring can be disabled with option -x to recollindex).
518
559
519
   Under KDE, you can place a small script to start recollindex -m under
560
   Under KDE, you can place a small script to start recollindex -m under
520
   $HOME/.kde/Autostart. This will be executed when the session begins.
561
   $HOME/.kde/Autostart. This will be executed when the session begins.
521
562
522
   There is a similar mechanism under Gnome (find the session control tool in
563
   There is a similar mechanism under Gnome (find the session control tool in
523
   the menus and use the "Startup programs" tab).
564
   the menus and use the "Startup programs" tab).
524
565
525
   By default, the indexing daemon will write its messages to a file inside
566
   By default, the messages from the indexing daemon will be discarded. You
526
   the configuration directory (this is controlled by the daemlogfilename and
567
   may want to change this by setting the daemlogfilename and daemloglevel
527
   daemloglevel configuration parameters). You may want to change this. Also
568
   configuration parameters. Also the log file will only be truncated when
528
   the log file will only be truncated when the daemon starts. If the daemon
569
   the daemon starts. If the daemon runs permanently, the log file may grow
529
   runs permanently, the log file may grow quite big, depending on the log
570
   quite big, depending on the log level.
530
   level.
531
571
532
   While it is convenient that data is indexed in real time, repeated
572
   While it is convenient that data is indexed in real time, repeated
533
   indexing can generate a significant load on the system when files such as
573
   indexing can generate a significant load on the system when files such as
534
   email folders change. Also, monitoring large file trees by itself
574
   email folders change. Also, monitoring large file trees by itself
535
   significantly taxes system resources. You probably do not want to enable
575
   significantly taxes system resources. You probably do not want to enable
...
...
582
   better scores). Any term will search for documents where at least one of
622
   better scores). Any term will search for documents where at least one of
583
   the terms appear.
623
   the terms appear.
584
624
585
   File name will specifically look for file names. The entry will be split
625
   File name will specifically look for file names. The entry will be split
586
   at white space characters, and each pattern will be separately expanded.
626
   at white space characters, and each pattern will be separately expanded.
587
   If you want to search for a pattern including white space, you need to use
627
   If you want to search for a pattern including white space, use double
588
   double quotes. The point of having a separate file name search is that
628
   quotes. The point of having a separate file name search is that wild card
589
   wild card expansion can be performed more efficiently on a relatively
629
   expansion can be performed more efficiently on a relatively small subset
590
   small subset of the index.
630
   of the index.
591
631
592
   The fourth entry (Query Language) is described in its own section.
632
   The fourth entry (Query Language) is described in its own section.
593
633
594
   All search modes allow wildcards inside terms (*, ?, []). You may want to
634
   All search modes allow wildcards inside terms (*, ?, []). You may want to
595
   have a look at the section about wildcards for more information about
635
   have a look at the section about wildcards for more information about
...
...
599
   enclosing the input inside double quotes. Ex: "virtual reality".
639
   enclosing the input inside double quotes. Ex: "virtual reality".
600
640
601
   Character case has no influence on search, except that you can disable
641
   Character case has no influence on search, except that you can disable
602
   stem expansion for any term by capitalizing it. Ie: a search for floor
642
   stem expansion for any term by capitalizing it. Ie: a search for floor
603
   will also normally look for flooring, floored, etc., but a search for
643
   will also normally look for flooring, floored, etc., but a search for
604
   Floor will only look for floor, in any character case. Sstemming can also
644
   Floor will only look for floor, in any character case. Stemming can also
605
   be disabled globally in the preferences.
645
   be disabled globally in the preferences.
606
646
607
   Recoll remembers the last few searches that you performed. You can use the
647
   Recoll remembers the last few searches that you performed. You can use the
608
   simple search text entry widget (a combobox) to recall them (click on the
648
   simple search text entry widget (a combobox) to recall them (click on the
609
   thing at the right of the text field). Please note, however, that only the
649
   thing at the right of the text field). Please note, however, that only the
...
...
614
   extracted from the database.
654
   extracted from the database.
615
655
616
   Double-clicking on a word in the result list or a preview window will
656
   Double-clicking on a word in the result list or a preview window will
617
   insert it into the simple search entry field.
657
   insert it into the simple search entry field.
618
658
619
   Note that, apart from wildcard characters (single ? characters are ok),
620
   you can cut and paste any text into an All terms or Any term search field,
659
   You can cut and paste any text into an All terms or Any term search field,
621
   punctuation, newlines and all. Recoll will process it and produce a
660
   punctuation, newlines and all - except for wildcard characters (single ?
661
   characters are ok). Recoll will process it and produce a meaningful
622
   meaningful search. This is what most differentiates this mode from the
662
   search. This is what most differentiates this mode from the Query Language
623
   Query Language mode, where you have to care about the syntax.
663
   mode, where you have to care about the syntax.
624
664
625
   You can use the Tools / Advanced search dialog for more complex searches.
665
   You can use the Tools / Advanced search dialog for more complex searches.
626
666
627
     ----------------------------------------------------------------------
667
     ----------------------------------------------------------------------
628
668
...
...
640
   open tabs in the existing preview window. You can use Shift+Click to force
680
   open tabs in the existing preview window. You can use Shift+Click to force
641
   the creation of another preview window, which may be useful to view the
681
   the creation of another preview window, which may be useful to view the
642
   documents side by side. (You can also browse successive results in a
682
   documents side by side. (You can also browse successive results in a
643
   single preview window by typing Shift+ArrowUp/Down in the window).
683
   single preview window by typing Shift+ArrowUp/Down in the window).
644
684
645
   Clicking the Edit link will attempt to start an external editor. The
685
   Clicking the Open link will attempt to start an external viewer. The
646
   editors can be configured through the user preferences dialog, or by
686
   viewer for each document type can be configured through the user
647
   editing the mimeview configuration file.
687
   preferences dialog, or by editing the mimeview configuration file. You can
688
   also check the Use desktop preferences option in the user preferences
689
   dialog to use the desktop defaults for all documents. This is probably the
690
   best option if you are using a well configured Gnome or KDE desktop.
648
691
649
   The Preview and Edit edit links may not be present for all entries,
692
   The Preview and Open edit links may not be present for all entries,
650
   meaning that Recoll has no configured way to preview a given file type
693
   meaning that Recoll has no configured way to preview a given file type
651
   (which was indexed by name only), or no configured external editor for the
694
   (which was indexed by name only), or no configured external editor for the
652
   file type. This can sometimes be adjusted simply by tweaking the mimemap
695
   file type. This can sometimes be adjusted simply by tweaking the mimemap
653
   and mimeview configuration files (the latter can be modified with the user
696
   and mimeview configuration files (the latter can be modified with the user
654
   preferences dialog).
697
   preferences dialog).
...
...
685
728
686
     * Save to File
729
     * Save to File
687
730
688
     * Find similar
731
     * Find similar
689
732
733
     * Preview Parent document
734
690
     * Parent document
735
     * Open Parent document
691
736
692
   The Preview and Edit entries do the same thing as the corresponding links.
737
   The Preview and Edit entries do the same thing as the corresponding links.
693
738
694
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
739
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
695
   for later pasting.
740
   for later pasting.
...
...
703
   The Find similar entry will select a number of relevant term from the
748
   The Find similar entry will select a number of relevant term from the
704
   current document and enter them into the simple search field. You can then
749
   current document and enter them into the simple search field. You can then
705
   start a simple search, with a good chance of finding documents related to
750
   start a simple search, with a good chance of finding documents related to
706
   the current result.
751
   the current result.
707
752
708
   The Parent document entry will appear for documents which are not actually
753
   The Parent document entries will appear for documents which are not
709
   files but are part of, or attached to, a higher level document. This entry
754
   actually files but are part of, or attached to, a higher level document.
710
   is mainly useful for email attachments and permits viewing the message to
755
   This entry is mainly useful for email attachments and permits viewing the
711
   which the document is attached. Note that the entry will also appear for
756
   message to which the document is attached. Note that the entry will also
712
   an email which is part of an mbox folder file, but that you can't actually
757
   appear for an email which is part of an mbox folder file, but that you
713
   visualize the folder (there will be an error dialog if you try). Recoll is
758
   can't actually visualize the folder (there will be an error dialog if you
714
   unfortunately not yet smart enough to disable the entry in this case.
759
   try). Recoll is unfortunately not yet smart enough to disable the entry in
760
   this case. In other cases, the Open option makes sense, for exemple to
761
   start a chm viewer on the parent document for a help page.
715
762
716
     ----------------------------------------------------------------------
763
     ----------------------------------------------------------------------
717
764
718
3.3. The preview window
765
3.3. The preview window
719
766
...
...
752
   A right-click menu in the text area allows switching between displaying
799
   A right-click menu in the text area allows switching between displaying
753
   the main text or the contents of fields associated to the document (ie:
800
   the main text or the contents of fields associated to the document (ie:
754
   author, abtract, etc.). This is especially useful in cases where the term
801
   author, abtract, etc.). This is especially useful in cases where the term
755
   match did not occur in the main text but in one of the fields.
802
   match did not occur in the main text but in one of the fields.
756
803
804
   You can print the current preview window contents by typing ^P (Ctrl + P)
805
   in the window text.
806
757
     ----------------------------------------------------------------------
807
     ----------------------------------------------------------------------
758
808
759
3.4. The query language
809
3.4. The query language
760
810
761
   The query language processor is activated on the simple search entry when
811
   The query language processor is activated on the simple search entry when
...
...
846
896
847
   You can use the show query link at the top of the result list to check the
897
   You can use the show query link at the top of the result list to check the
848
   exact query which was finally executed by Xapian.
898
   exact query which was finally executed by Xapian.
849
899
850
   Most Xesam phrase modifiers are unsupported, except for l (small ell) to
900
   Most Xesam phrase modifiers are unsupported, except for l (small ell) to
851
   disable stemming, and p to turn an phrase into a NEAR (unordered) search.
901
   disable stemming, and p to turn a phrase into a NEAR (unordered) search.
852
   Exemple: "prejudice pride"p
902
   Exemple: "prejudice pride"p
853
903
854
     ----------------------------------------------------------------------
904
     ----------------------------------------------------------------------
855
905
856
3.5. Complex/advanced search
906
3.5. Complex/advanced search
...
...
1160
   Browsing the result list inside a preview window. Entering Shift-Down or
1210
   Browsing the result list inside a preview window. Entering Shift-Down or
1161
   Shift-Up (Shift + an arrow key) in a preview window will display the next
1211
   Shift-Up (Shift + an arrow key) in a preview window will display the next
1162
   or the previous document from the result list. Any secondary search
1212
   or the previous document from the result list. Any secondary search
1163
   currently active will be executed on the new document.
1213
   currently active will be executed on the new document.
1164
1214
1215
   Scrolling the result list from the keyboard. You can use PageUp and
1216
   PageDown to scroll the result list, Shift+Home to go back to the first
1217
   page. These work even while the focus is in the search entry.
1218
1165
   Forced opening of a preview window. You can use Shift+Click on a result
1219
   Forced opening of a preview window. You can use Shift+Click on a result
1166
   list Preview link to force the creation of a preview window instead of a
1220
   list Preview link to force the creation of a preview window instead of a
1167
   new tab in the existing one.
1221
   new tab in the existing one.
1168
1222
1169
   Closing previews. Entering ^W in a tab will close it (and, for the last
1223
   Closing previews. Entering ^W in a tab will close it (and, for the last
1170
   tab, close the preview window). Entering Esc will close the preview window
1224
   tab, close the preview window). Entering Esc will close the preview window
1171
   and all its tabs.
1225
   and all its tabs.
1172
1226
1227
   Printing previews. Entering ^P in a preview window will print the
1228
   currently displayed text.
1229
1173
   Quitting. Entering ^Q almost anywhere will close the application.
1230
   Quitting. Entering ^Q almost anywhere will close the application.
1174
1231
1175
     ----------------------------------------------------------------------
1232
     ----------------------------------------------------------------------
1176
1233
1177
3.12. Customizing the search interface
1234
3.12. Customizing the search interface
1178
1235
1179
   It is possible to customize some aspects of the search interface by using
1236
   You can customize some aspects of the search interface by using the Query
1180
   Query configuration entry in the Preferences menu.
1237
   configuration entry in the Preferences menu.
1181
1238
1182
   There are two tabs in the dialog, dealing with the interface itself, and
1239
   There are several tabs in the dialog, dealing with the interface itself,
1183
   with the parameters used for searching and returning results.
1240
   the parameters used for searching and returning results, and what indexes
1241
   are searched.
1184
1242
1185
   User interface parameters:
1243
   User interface parameters:
1186
1244
1187
     * Number of results in a result page:
1245
     * Number of results in a result page:
1188
1246
...
...
1198
       result list, and you may want to customize the font and/or font size.
1256
       result list, and you may want to customize the font and/or font size.
1199
       The rest of the fonts used by Recoll are determined by your generic QT
1257
       The rest of the fonts used by Recoll are determined by your generic QT
1200
       config (try the qtconfig command).
1258
       config (try the qtconfig command).
1201
1259
1202
     * Result paragraph format string: allows you to change the presentation
1260
     * Result paragraph format string: allows you to change the presentation
1203
       of each result list entry. This is a qt-html string where the
1261
       of each result list entry. This is described in its own section.
1204
       following printf-like % substitutions will be performed:
1205
1262
1263
     * Maximum text size highlighted for preview Inserting highlights on
1264
       search term inside the text before inserting it in the preview window
1265
       involves quite a lot of processing, and can be disabled over the given
1266
       text size to speed up loading.
1267
1268
     * Use desktop preferences to choose document editor: if this is checked,
1269
       the xdg-open utility will be used to open files when you click the
1270
       Edit link in the result list, instead of the application defined in
1271
       mimeview. xdg-open will in term use your desktop preferences to choose
1272
       an appropriate application.
1273
1274
     * Choose editor applications this will let you choose the command
1275
       started by the Edit links inside the result list, for specific
1276
       document types.
1277
1278
     * Display category filter as toolbar... this will let you choose if the
1279
       document categories are displayed as a list or a set of buttons.
1280
1281
     * Auto-start simple search on white space entry: if this is checked, a
1282
       search will be executed each time you enter a space in the simple
1283
       search input field. This lets you look at the result list as you enter
1284
       new terms. This is off by default, you may like it or not...
1285
1286
     * Start with advanced search dialog open and Start with sort dialog
1287
       open: If you use these dialogs all the time, checking these entries
1288
       will get them to open when recoll starts.
1289
1290
     * Remember sort activation state if set, Recoll will remember the sort
1291
       tool stat between invocations. It normally starts with sorting
1292
       disabled.
1293
1294
     * Prefer HTML to plain text for preview if set, Recoll will display HTML
1295
       as such inside the preview window. If this causes problems with the Qt
1296
       HTML display, you can uncheck it to display the plain text version
1297
       instead.
1298
1299
   Search parameters:
1300
1301
     * Stemming language: stemming obviously depends on the document's
1302
       language. This listbox will let you chose among the stemming databases
1303
       which were built during indexing (this is set in the main
1304
       configuration file), or later added with recollindex -s (See the
1305
       recollindex manual). Stemming languages which are dynamically added
1306
       will be deleted at the next indexing pass unless they are also added
1307
       in the configuration file.
1308
1309
     * Dynamically add phrase to simple searches: a phrase will be
1310
       automatically built and added to simple searches when looking for Any
1311
       terms. This will give a relevance boost to the results where the
1312
       search terms appear as a phrase (consecutive and in order).
1313
1314
     * Replace abstracts from documents: this decides if we should synthesize
1315
       and display an abstract in place of an explicit abstract found within
1316
       the document itself.
1317
1318
     * Dynamically build abstracts: this decides if Recoll tries to build
1319
       document abstracts when displaying the result list. Abstracts are
1320
       constructed by taking context from the document information, around
1321
       the search terms. This can slow down result list display significantly
1322
       for big documents, and you may want to turn it off.
1323
1324
     * Replace abstracts from documents: this decides if we should synthesize
1325
       and display an abstract in place of an explicit abstract found within
1326
       the document itself.
1327
1328
     * Synthetic abstract size: adjust to taste...
1329
1330
     * Synthetic abstract context words: how many words should be displayed
1331
       around each term occurrence.
1332
1333
   External indexes: This panel will let you browse for additional indexes
1334
   that you may want to search. External indexes are designated by their
1335
   database directory (ie: /home/someothergui/.recoll/xapiandb,
1336
   /usr/local/recollglobal/xapiandb).
1337
1338
   Once entered, the indexes will appear in the External indexes list, and
1339
   you can chose which ones you want to use at any moment by checking or
1340
   unchecking their entries.
1341
1342
   Your main database (the one the current configuration indexes to), is
1343
   always implicitly active. If this is not desirable, you can set up your
1344
   configuration so that it indexes, for example, an empty directory. An
1345
   alternative indexer may also need to implement a way of purging the index
1346
   from stale data,
1347
1348
     ----------------------------------------------------------------------
1349
1350
  3.12.1. The result list paragraph format
1351
1352
   The presentation of each result inside the result list can be customized
1353
   by setting the result list paragraph format inside the User Interface tab
1354
   of the Query configuration.
1355
1356
   This is a Qt HTML string where the following printf-like % substitutions
1357
   will be performed:
1358
1206
          * %A. Abstract
1359
     * %A. Abstract
1207
1360
1208
          * %D. Date
1361
     * %D. Date
1209
1362
1210
          * %I. Icon image name
1363
     * %I. Icon image name
1211
1364
1212
          * %K. Keywords (if any)
1365
     * %K. Keywords (if any)
1213
1366
1214
          * %L. Preview and Edit links
1367
     * %L. Preview and Edit links
1215
1368
1216
          * %M. Mime type
1369
     * %M. Mime type
1217
1370
1218
          * %N. result Number
1371
     * %N. result Number
1219
1372
1220
          * %R. Relevance percentage
1373
     * %R. Relevance percentage
1221
1374
1222
          * %S. Size information
1375
     * %S. Size information
1223
1376
1224
          * %T. Title
1377
     * %T. Title
1225
1378
1226
          * %U. Url
1379
     * %U. Url
1227
1380
1381
   The format of the Preview and Edit links is <a href="P%N"> and <a
1382
   href="E%N"> where docnum (%N expands to the document number inside the
1383
   result list).
1384
1385
   In addition to the predefined values above, all strings like %(fieldname)
1386
   will be replaced by the value of the field named fieldname for this
1387
   document. Only stored fields can be accessed in this way, the value of
1388
   indexed but not stored fields is not known at this point in the search
1389
   process (see field configuration). There are currently very few fields
1390
   stored by default, apart from the values above (only author), so this
1391
   feature will need some custom local configuration to be useful. For
1392
   example, you could look at the fields for the document types of interest
1393
   (use the right-click menu inside the preview window), and add what you
1394
   want to the list of stored fields. A candidate example would be the
1395
   recipient field which is generated by the message filters.
1396
1228
       The default value for the string is:
1397
   The default value for the paragraph format string is:
1229
1398
1230
 <img src="%I" align="left">%R %S %L &nbsp;&nbsp;<b>%T</b><br>
1399
 <img src="%I" align="left">%R %S %L &nbsp;&nbsp;<b>%T</b><br>
1231
 %M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i><br>
1400
 %M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i>&nbsp;%i<br>
1232
 %A %K
1401
 %A %K
1233
       
1402
       
1234
1403
1235
       You may, for example, try the following for a more web-like
1404
   You may, for example, try the following for a more web-like experience:
1236
       experience:
1237
1405
1238
 <u><b><a href="P%N">%T</a></b></u><br>
1406
 <u><b><a href="P%N">%T</a></b></u><br>
1239
 %A<font color=#008000>%U - %S</font> - %L
1407
 %A<font color=#008000>%U - %S</font> - %L
1240
       
1408
       
1241
1409
1242
       Or the clean looking:
1410
   Or the clean looking:
1243
1411
1244
 <img src="%I" align="left">%L <font color="#900000">%R</font>
1412
 <img src="%I" align="left">%L <font color="#900000">%R</font>
1245
   <b>%T</b><br>%S 
1413
   <b>%T</b><br>%S 
1246
 <font color="#808080"><i>%U</i></font>
1414
 <font color="#808080"><i>%U</i></font>
1247
 <table bgcolor="#e0e0e0">
1415
 <table bgcolor="#e0e0e0">
1248
 <tr><td><div>%A</div></td></tr>
1416
 <tr><td><div>%A</div></td></tr>
1249
 </table>%K
1417
 </table>%K
1250
       
1418
       
1251
1419
1252
       The format of the Preview and Edit links is <a href="Pdocnum"> and <a
1420
   Note that the P%N link in the above paragraph makes the title a preview
1253
       href="Edocnum"> where docnum is what %N would print. This makes the
1421
   link.
1254
       title a preview link in the above format.
1255
1422
1256
       Please note that, due to the way the program handles right mouse
1423
   Due to the way the program handles right mouse clicks in the result list,
1257
       clicks in the result list, if the custom formatting results in
1424
   if the custom formatting results in multiple paragraphs per result, right
1258
       multiple paragraphs per result, right clicks will only work inside the
1425
   clicks will only work inside the first one.
1259
       first one.
1260
1261
     * HTML help browser: this will let you chose your preferred browser
1262
       which will be started from the Help menu to read the user manual. You
1263
       can enter a simple name if the command is in your PATH, or browse for
1264
       a full pathname.
1265
1266
     * Auto-start simple search on white space entry: if this is checked, a
1267
       search will be executed each time you enter a space in the simple
1268
       search input field. This lets you look at the result list as you enter
1269
       new terms. This is off by default, you may like it or not...
1270
1271
     * Start with advanced search dialog open and Start with sort dialog
1272
       open: If you use these dialogs all the time, checking these entries
1273
       will get them to open when recoll starts.
1274
1275
     * Use desktop preferences to choose document editor: if this is checked,
1276
       the xdg-open utility will be used to open files when you click the
1277
       Edit link in the result list, instead of the application defined in
1278
       mimeview. xdg-open will in term use your desktop preferences to choose
1279
       an appropriate application.
1280
1281
   Search parameters:
1282
1283
     * Stemming language: stemming obviously depends on the document's
1284
       language. This listbox will let you chose among the stemming databases
1285
       which were built during indexing (this is set in the main
1286
       configuration file), or later added with recollindex -s (See the
1287
       recollindex manual). Stemming languages which are dynamically added
1288
       will be deleted at the next indexing pass unless they are also added
1289
       in the configuration file.
1290
1291
     * Dynamically build abstracts: this decides if Recoll tries to build
1292
       document abstracts when displaying the result list. Abstracts are
1293
       constructed by taking context from the document information, around
1294
       the search terms. This can slow down result list display significantly
1295
       for big documents, and you may want to turn it off.
1296
1297
     * Replace abstracts from documents: this decides if we should synthesize
1298
       and display an abstract in place of an explicit abstract found within
1299
       the document itself.
1300
1301
     * Synthetic abstract size: adjust to taste...
1302
1303
     * Synthetic abstract context words: how many words should be displayed
1304
       around each term occurrence.
1305
1306
   External indexes: This panel will let you browse for additional indexes
1307
   that you may want to search. External indexes are designated by their
1308
   database directory (ie: /home/someothergui/.recoll/xapiandb,
1309
   /usr/local/recollglobal/xapiandb).
1310
1311
   Once entered, the indexes will appear in the External indexes list, and
1312
   you can chose which ones you want to use at any moment by checking or
1313
   unchecking their entries.
1314
1315
   Your main database (the one the current configuration indexes to), is
1316
   always implicitly active. If this is not desirable, you can set up your
1317
   configuration so that it indexes, for example, an empty directory. An
1318
   alternative indexer may also need to implement a way of purging the index
1319
   from stale data,
1320
1426
1321
     ----------------------------------------------------------------------
1427
     ----------------------------------------------------------------------
1322
1428
1323
                  Chapter 4. Searching with the KDE KIO slave
1429
                  Chapter 4. Searching with the KDE KIO slave
1324
1430
...
...
1411
1517
1412
 recollq 'ilur -nautique mime:text/html'
1518
 recollq 'ilur -nautique mime:text/html'
1413
 Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
1519
 Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
1414
   OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
1520
   OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
1415
 4 results
1521
 4 results
1416
 text/html   [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html]  [comptes.html]  18593   bytes  
1522
 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html]      [comptes.html]  18593   bytes  
1417
 text/html   [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
1523
 text/html       [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
1418
 text/html   [file:///Users/uncrypted-dockes/projets/pagepers/index.html]    [psxtcl/writemime/recoll]...
1524
 text/html       [file:///Users/uncrypted-dockes/projets/pagepers/index.html]    [psxtcl/writemime/recoll]...
1419
 text/html   [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
1525
 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
1420
1526
1421
     ----------------------------------------------------------------------
1527
     ----------------------------------------------------------------------
1422
1528
1423
                        Chapter 6. Programming interface
1529
                        Chapter 6. Programming interface
1424
1530
...
...
1437
1543
1438
   Recoll filters are executable programs which translate from a specific
1544
   Recoll filters are executable programs which translate from a specific
1439
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
1545
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
1440
   format, which may be text/plain or text/html.
1546
   format, which may be text/plain or text/html.
1441
1547
1548
   As of Recoll 1.13, there are two kinds of filters:
1549
1550
     * Simple filters (the old ones) run once and exit. They can be bare
1551
       programs like antiword, or shell-scripts using other programs. They
1552
       are very simple to write, just having to write the text to the
1553
       standard output.
1554
1555
     * Multiple filters, new in 1.13, run as long as their master process
1556
       (ie: recollindex) is active. They can process multiple files (sparing
1557
       the process startup time which can be very significant), or multiple
1558
       documents per file (ie: for zip or chm files). They communicate with
1559
       the indexer through a simple protocol, but are nevertheless a bit more
1560
       complicated than the older kind. Most of these new filters are written
1561
       in Python, using a common module to handle the protocol.
1562
1563
   The following will just describe the simple filters, if you are programmer
1564
   enough to write one of the other kind, it shouldn't be too difficult to
1565
   make sense of one of the existing modules (ie: rclzip).
1566
1442
   Recoll filters are usually shell-scripts, but this is in no way necessary.
1567
   Recoll simple filters are usually shell-scripts, but this is in no way
1443
   These programs are extremely simple and most of the difficulty lies in
1568
   necessary. These programs are extremely simple and most of the difficulty
1444
   extracting the text from the native format, not outputting what is
1569
   lies in extracting the text from the native format, not outputting what is
1445
   expected by Recoll. Happily enough, most document formats already have
1570
   expected by Recoll. Happily enough, most document formats already have
1446
   translators or text extractors which handle the difficult part and can be
1571
   translators or text extractors which handle the difficult part and can be
1447
   called from the filter. In some case the output of the translating program
1572
   called from the filter. In some case the output of the translating program
1448
   is appropriate, and no intermediate shell-script is needed.
1573
   is appropriate, and no intermediate shell-script is needed.
1449
1574
...
...
1457
   The association of file types to filters is performed in the mimeconf
1582
   The association of file types to filters is performed in the mimeconf
1458
   file. A sample:
1583
   file. A sample:
1459
1584
1460
 
[index]
1585
 
[index]
1461
 application/msword = exec antiword -t -i 1 -m UTF-8;\
1586
 application/msword = exec antiword -t -i 1 -m UTF-8;\
1462
      mimetype=text/plain;charset=utf-8
1587
      mimetype = text/plain ; charset=utf-8
1463
1588
1464
 application/ogg = exec rclogg
1589
 application/ogg = exec rclogg
1465
1590
1466
 text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html
1591
 text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html
1467
1592
1593
 application/x-chm = execm rclchm
1594
1468
   The fragment specifies that:
1595
   The fragment specifies that:
1469
1596
1470
     * application/msword files are processed by executing the antiword
1597
     * application/msword files are processed by executing the antiword
1471
       program, which outputs text/plain encoded in iso-8859-1.
1598
       program, which outputs text/plain encoded in utf-8.
1472
1599
1473
     * application/ogg files are processed by the rclogg script, with default
1600
     * application/ogg files are processed by the rclogg script, with default
1474
       output type (text/html, with encoding specified in the header, or
1601
       output type (text/html, with encoding specified in the header, or
1475
       utf-8 by default).
1602
       utf-8 by default).
1476
1603
1477
     * text/rtf is processed by unrtf, which outputs text/html. The
1604
     * text/rtf is processed by unrtf, which outputs text/html. The
1478
       iso-8859-1 encoding is specified because it is not the utf-8 default,
1605
       iso-8859-1 encoding is specified because it is not the utf-8 default,
1479
       and not output by unrtf in the HTML header section.
1606
       and not output by unrtf in the HTML header section.
1607
1608
     * application/x-chm is processed by a persistant filter. This is
1609
       determined by the execm keyword.
1480
1610
1481
   The easiest way to write a new filter is probably to start from an
1611
   The easiest way to write a new filter is probably to start from an
1482
   existing one.
1612
   existing one.
1483
1613
1484
   Filters which output text/plain text are generally simpler, but they
1614
   Filters which output text/plain text are generally simpler, but they
...
...
1550
   A field becomes indexed by having a prefix defined in the [prefixes]
1680
   A field becomes indexed by having a prefix defined in the [prefixes]
1551
   section of the fields file. See the comments in there for details
1681
   section of the fields file. See the comments in there for details
1552
1682
1553
   A field becomes stored by appearing in the [stored] section of the fields
1683
   A field becomes stored by appearing in the [stored] section of the fields
1554
   file.
1684
   file.
1685
1686
   See the comments inside the fields for more details.
1555
1687
1556
     ----------------------------------------------------------------------
1688
     ----------------------------------------------------------------------
1557
1689
1558
6.3. API
1690
6.3. API
1559
1691
...
...
1837
1969
1838
     ----------------------------------------------------------------------
1970
     ----------------------------------------------------------------------
1839
1971
1840
                            Chapter 7. Installation
1972
                            Chapter 7. Installation
1841
1973
1842
7.1. Installing a prebuilt copy
1974
7.1. Installing a binary copy
1843
1975
1844
   Recoll binary packages from the Recoll web site are always linked
1976
   There are three types of binary Recoll installations:
1845
   statically to the Xapian libraries, and have no other dependencies. You
1977
1978
     * Through your system normal software distribution framework (ie,
1979
       Debian/Ubuntu apt, FreeBSD ports, etc.).
1980
1981
     * From a package downloaded from the Recoll web site.
1982
1983
     * From a prebuilt tree downloaded from the Recoll web site.
1984
1985
   In all cases, the strict software dependancies (ie on Xapian or iconv)
1986
   will be automatically satisfied, you should not have to worry about them.
1987
1846
   will only have to check or install supporting applications for the file
1988
   You will only have to check or install supporting applications for the
1847
   types that you want to index beyond text, HTML and mail files, and maybe
1989
   file types that you want to index beyond those that are natively processed
1848
   have a look at the configuration section (but this may not be necessary
1990
   by Recoll (text, HTML, mail files, and a few others).
1991
1992
   You should also maybe have a look at the configuration section (but this
1849
   for a quick test with default parameters).
1993
   may not be necessary for a quick test with default parameters). Most
1994
   parameters can be more conveniently set from the GUI interface.
1850
1995
1851
     ----------------------------------------------------------------------
1996
     ----------------------------------------------------------------------
1852
1997
1853
  7.1.1. Installing through a package system
1998
  7.1.1. Installing through a package system
1854
1999
1855
   If you use a BSD-type port system or a prebuilt package (RPM or other),
2000
   If you use a BSD-type port system or a prebuilt package (DEB, RPM,
2001
   manually or through the system software configuration utility), just
1856
   just follow the usual procedure for your system.
2002
   follow the usual procedure for your system.
1857
2003
1858
     ----------------------------------------------------------------------
2004
     ----------------------------------------------------------------------
1859
2005
1860
  7.1.2. Installing a prebuilt Recoll
2006
  7.1.2. Installing a prebuilt Recoll
1861
2007
...
...
1874
2020
1875
7.2. Supporting packages
2021
7.2. Supporting packages
1876
2022
1877
   Recoll uses external applications to index some file types. You need to
2023
   Recoll uses external applications to index some file types. You need to
1878
   install them for the file types that you wish to have indexed (these are
2024
   install them for the file types that you wish to have indexed (these are
1879
   run-time dependencies. None is needed for building Recoll).
2025
   run-time optional dependencies. None is needed for building or running
2026
   Recoll except for indexing their specific file type).
1880
2027
1881
   After an indexing pass, the commands that were found missing can be
2028
   After an indexing pass, the commands that were found missing can be
1882
   displayed from the recoll File menu. The list is stored in the missing
2029
   displayed from the recoll File menu. The list is stored in the missing
1883
   text file inside the configuration directory.
2030
   text file inside the configuration directory.
1884
2031
...
...
1906
2053
1907
     * dvi: dvips
2054
     * dvi: dvips
1908
2055
1909
     * djvu: DjVuLibre
2056
     * djvu: DjVuLibre
1910
2057
1911
     * MP3: Recoll will use the id3info command from the id3lib package to
2058
     * mp3: Recoll will use the id3info command from the id3lib package to
1912
       extract tag information. Without it, only the file names will be
2059
       extract tag information. Without it, only the file names will be
1913
       indexed.
2060
       indexed.
1914
2061
2062
     * flac files need metaflac.
2063
2064
     * ogg files need ogginfo.
2065
1915
     * Pictures: Recoll uses the Exiftool Perl package to extract tag
2066
     * Pictures: Recoll uses the Exiftool Perl package to extract tag
1916
       information. Most image file formats are supported.
2067
       information. Most image file formats are supported. Note that there
2068
       may not be much interest in indexing the technical tags (image size,
2069
       aperture, etc.). This is only of interest if you store personal tags
2070
       or textual descriptions inside the image files.
1917
2071
2072
     * chm: files in microsoft help format need Python and the pychm module
2073
       (which needs chmlib).
2074
2075
     * ics: iCalendar files need Python and the icalendar module.
2076
2077
     * zip: Zip archives need Python (and the standard zipfile module).
2078
1918
   Text, HTML, mail folders Openoffice and Scribus files are processed
2079
   Text, HTML, mail folders, Openoffice and Scribus files are processed
1919
   internally. Lyx is used to index Lyx files. Many filters need sed and awk.
2080
   internally. Lyx is used to index Lyx files. Many filters need sed and awk.
1920
2081
1921
     ----------------------------------------------------------------------
2082
     ----------------------------------------------------------------------
1922
2083
1923
7.3. Building from source
2084
7.3. Building from source
1924
2085
1925
  7.3.1. Prerequisites
2086
  7.3.1. Prerequisites
1926
2087
1927
   At the very least, you will need to download and install the xapian core
2088
   At the very least, you will need to download and install the xapian core
1928
   package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
2089
   package and the qt run-time and development packages. Check the Recoll
1929
   version will work too), and the qt run-time and development packages
2090
   download page for up to date version information.
1930
   (Recoll development currently uses version 3.3.5, but any 3.3 version is
1931
   probably OK).
1932
2091
1933
   You will most probably be able to find a binary package for qt for your
2092
   You will most probably be able to find a binary package for qt for your
1934
   system. You may have to compile Xapian but this is not difficult (if you
2093
   system. You may have to compile Xapian but this is not difficult (if you
1935
   are using FreeBSD, there is a port).
2094
   are using FreeBSD, there is a port).
1936
2095
...
...
1940
2099
1941
     ----------------------------------------------------------------------
2100
     ----------------------------------------------------------------------
1942
2101
1943
  7.3.2. Building
2102
  7.3.2. Building
1944
2103
1945
   Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
2104
   Recoll has been built on Linux, FreeBSD, macosx, and Solaris, most
1946
   3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
2105
   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
1947
   system, and need to modify things, I would very much welcome patches.
2106
   ok). If you build on another system, and need to modify things, I would
2107
   very much welcome patches.
1948
2108
1949
   Depending on the qt configuration on your system, you may have to set the
2109
   Depending on the qt configuration on your system, you may have to set the
1950
   QTDIR and QMAKESPECS variables in your environment:
2110
   QTDIR and QMAKESPECS variables in your environment:
1951
2111
1952
     * QTDIR should point to the directory above the one that holds the qt
2112
     * QTDIR should point to the directory above the one that holds the qt
...
...
1955
2115
1956
     * QMAKESPECS should be set to the name of one of the qt mkspecs
2116
     * QMAKESPECS should be set to the name of one of the qt mkspecs
1957
       sub-directories (ie: linux-g++).
2117
       sub-directories (ie: linux-g++).
1958
2118
1959
   On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
2119
   On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
1960
   is not needed because there is a default link in mkspecs/.
2120
   is not needed because there is a default link in mkspecs/. Neither should
2121
   be needed with Qt 4.
1961
2122
1962
   Configure options: --without-aspell will disable the code for phonetic
2123
   Configure options:
1963
   matching of search terms. --with-fam or --with-inotify will enable the
2124
2125
     * --without-aspell will disable the code for phonetic matching of search
2126
       terms.
2127
2128
     * --with-fam or --with-inotify will enable the code for real time
1964
   code for real time indexing. Inotify support is enabled by default on
2129
       indexing. Inotify support is enabled by default on recent Linux
1965
   recent Linux systems.
2130
       systems.
2131
2132
     * --enable-xattr will enable code to fetch data from file extended
2133
       attributes. This is only useful is some application stores data in
2134
       there, and also needs some simple configuration (see comments in the
2135
       fields configuration file).
2136
2137
     * --with-file-command Specify the version of the 'file' command to use
2138
       (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
2139
       the gnu version on systems where the native one is bad.
2140
2141
     * --without-gui Disable the Qt interface, and auxiliary uses of X11, and
2142
       compile the command line version.
1966
2143
1967
   Normal procedure:
2144
   Normal procedure:
1968
2145
1969
         cd recoll-xxx
2146
         cd recoll-xxx
1970
         configure
2147
         configure
1971
         make
2148
         make
1972
         (practices usual hardship-repelling invocations)
2149
         (practices usual hardship-repelling invocations)
1973
     
2150
     
1974
2151
1975
   There little auto-configuration. The configure script will mainly link one
2152
   There is little auto-configuration. The configure script will mainly link
1976
   of the system-specific files in the mk directory to mk/sysconf. If your
2153
   one of the system-specific files in the mk directory to mk/sysconf. If
1977
   system is not known yet, it will tell you as much, and you may want to
2154
   your system is not known yet, it will tell you as much, and you may want
1978
   manually copy and modify one of the existing files (the new file name
2155
   to manually copy and modify one of the existing files (the new file name
1979
   should be the output of uname -s).
2156
   should be the output of uname -s).
1980
2157
1981
     ----------------------------------------------------------------------
2158
     ----------------------------------------------------------------------
1982
2159
1983
  7.3.3. Installation
2160
  7.3.3. Installation
...
...
2077
   The default configuration will index your home directory. If this is not
2254
   The default configuration will index your home directory. If this is not
2078
   appropriate, start recoll to create a blank configuration, click Cancel,
2255
   appropriate, start recoll to create a blank configuration, click Cancel,
2079
   and edit the configuration file before restarting the command. This will
2256
   and edit the configuration file before restarting the command. This will
2080
   start the initial indexing, which may take some time.
2257
   start the initial indexing, which may take some time.
2081
2258
2082
   Paramers:
2259
   Paramers affecting what we index:
2083
2260
2084
   topdirs
2261
   topdirs
2085
2262
2086
           Specifies the list of directories or files to index (recursively
2263
           Specifies the list of directories or files to index (recursively
2087
           for directories). The indexer will not follow symbolic links
2264
           for directories). The indexer will not follow symbolic links
2088
           inside the indexed trees by default (see the followLinks options
2265
           inside the indexed trees by default (see the followLinks options
2089
           though).
2266
           though).
2090
2267
2091
   dbdir
2092
2093
           The name of the Xapian data directory. It will be created if
2094
           needed when the index is initialized. If this is not an absolute
2095
           path, it will be interpreted relative to the configuration
2096
           directory. The value can have embedded spaces but starting or
2097
           trailing spaces will be trimmed. You cannot use quotes here.
2098
2099
   skippedNames
2268
   skippedNames
2100
2269
2101
           A space-separated list of patterns for names of files or
2270
           A space-separated list of patterns for names of files or
2102
           directories that should be completely ignored. The list defined in
2271
           directories that should be completely ignored. The list defined in
2103
           the default file is:
2272
           the default file is:
2104
2273
2105
 skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
2274
 skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
2106
          *~ recollrc
2275
            *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
2276
            .recoll* xapiandb recollrc recoll.conf
2107
2277
2108
           The list can be redefined for sub-directories, but is only
2278
           The list can be redefined at any sub-directory in the indexed
2109
           actually changed for the top level ones in topdirs.
2279
           area.
2110
2280
2111
           The top-level directories are not affected by this list (that is,
2281
           The top-level directories are not affected by this list (that is,
2112
           a directory in topdirs might match and would still be indexed).
2282
           a directory in topdirs might match and would still be indexed).
2113
2283
2114
           The list in the default configuration does not exclude hidden
2284
           The list in the default configuration does not exclude hidden
...
...
2147
           avoid multiple indexing of linked files. No effort is made to
2317
           avoid multiple indexing of linked files. No effort is made to
2148
           avoid duplication when this option is set to true. This option can
2318
           avoid duplication when this option is set to true. This option can
2149
           be set individually for each of the topdirs members by using
2319
           be set individually for each of the topdirs members by using
2150
           sections. It can not be changed below the topdirs level.
2320
           sections. It can not be changed below the topdirs level.
2151
2321
2322
   indexedmimetypes
2323
2324
           Recoll normally indexes any file which it knows how to read. This
2325
           list lets you restrict the indexed mime types to what you specify.
2326
           If the variable is unspecified or the list empty (the default),
2327
           all supported types are processed.
2328
2329
   compressedfilemaxkbs
2330
2331
           Size limit for compressed (.gz or .bz2) files. These need to be
2332
           decompressed in a temporary directory for identification, which
2333
           can be very wasteful if 'uninteresting' big compressed files are
2334
           present. Negative means no limit, 0 means no processing of any
2335
           compressed file. Defaults to -1.
2336
2337
   textfilemaxmbs
2338
2339
           Maximum size for text files. Very big text files are often
2340
           uninteresting logs. Set to -1 to disable (default 20MB).
2341
2342
   textfilepagekbs
2343
2344
           If set to other than -1, text files will be indexed as multiple
2345
           documents of the given page size. This may be useful if you do
2346
           want to index very big text files as it will both reduce memory
2347
           usage at index time and help with loading data to the preview
2348
           window. A size of a few megabytes would seem reasonable (default:
2349
           1MB).
2350
2351
   indexallfilenames
2352
2353
           Recoll indexes file names in a special section of the database to
2354
           allow specific file names searches using wild cards. This
2355
           parameter decides if file name indexing is performed only for
2356
           files with mime types that would qualify them for full text
2357
           indexing, or for all files inside the selected subtrees,
2358
           independently of mime type.
2359
2360
   usesystemfilecommand
2361
2362
           Decide if we use the file -i system command as a final step for
2363
           determining the mime type for a file (the main procedure uses
2364
           suffix associations as defined in the mimemap file). This can be
2365
           useful for files with suffix-less names, but it will also cause
2366
           the indexing of many bogus "text" files.
2367
2368
   processbeaglequeue
2369
2370
           If this is set, process the directory where Beagle Web browser
2371
           plugins copy visited pages for indexing. Of course, Beagle MUST
2372
           NOT be running, else things will behave strangely.
2373
2374
   beaglequeuedir
2375
2376
           The path to the Beagle indexing queue. This is hard-coded in the
2377
           Beagle plugin as ~/.beagle/ToIndex so there should be no need to
2378
           change it.
2379
2380
   Parameters affecting where and how we store things:
2381
2382
   dbdir
2383
2384
           The name of the Xapian data directory. It will be created if
2385
           needed when the index is initialized. If this is not an absolute
2386
           path, it will be interpreted relative to the configuration
2387
           directory. The value can have embedded spaces but starting or
2388
           trailing spaces will be trimmed. You cannot use quotes here.
2389
2390
   maxfsoccuppc
2391
2392
           Maximum file system occupation before we stop indexing. The value
2393
           is a percentage, corresponding to what the "Capacity" df output
2394
           column shows. The default value is 0, meaning no checking.
2395
2396
   mboxcachedir
2397
2398
           The directory where mbox message offsets cache files are held.
2399
           This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
2400
           to share a directory between different configurations.
2401
2402
   mboxcacheminmbs
2403
2404
           The minimum mbox file size over which we cache the offsets. There
2405
           is really no sense in caching offsets for small files. The default
2406
           is 5 MB.
2407
2408
   webcachedir
2409
2410
           This is only used by the Beagle web browser plugin indexing code,
2411
           and defines where the cache for visited pages will live. Default:
2412
           $RECOLL_CONFDIR/webcache
2413
2414
   webcachemaxmbs
2415
2416
           This is only used by the Beagle web browser plugin indexing code,
2417
           and defines the maximum size for the web page cache. Default: 40
2418
           MB.
2419
2420
   idxflushmb
2421
2422
           Threshold (megabytes of new text data) where we flush from memory
2423
           to disk index. Setting this can help control memory usage. A value
2424
           of 0 means no explicit flushing, letting Xapian use its own
2425
           default, which is flushing every 10000 documents (memory usage
2426
           depends on average document size). The default value is 10.
2427
2428
   Miscellani:
2429
2152
   loglevel,daemloglevel
2430
   loglevel,daemloglevel
2153
2431
2154
           Verbosity level for recoll and recollindex. A value of 4 lists
2432
           Verbosity level for recoll and recollindex. A value of 4 lists
2155
           quite a lot of debug/information messages. 2 only lists errors.
2433
           quite a lot of debug/information messages. 2 only lists errors.
2156
           The daemversion is specific to the indexing monitor daemon.
2434
           The daemversion is specific to the indexing monitor daemon.
...
...
2176
           character set definition (ie: plain text files). This can be
2454
           character set definition (ie: plain text files). This can be
2177
           redefined for any sub-directory. If it is not set at all, the
2455
           redefined for any sub-directory. If it is not set at all, the
2178
           character set used is the one defined by the nls environment
2456
           character set used is the one defined by the nls environment
2179
           (LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
2457
           (LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
2180
2458
2181
   maxfsoccuppc
2459
   filtermaxseconds
2182
2460
2183
           Maximum file system occupation before we stop indexing. The value
2461
           Maximum filter execution time, after which it is aborted. Some
2184
           is a percentage, corresponding to what the "Capacity" df output
2462
           postscript programs just loop...
2185
           column shows. The default value is 0, meaning no checking.
2186
2463
2187
   idxflushmb
2464
   maildefcharset
2188
2465
2189
           Threshold (megabytes of new text data) where we flush from memory
2466
           This can be used to define the default character set specifically
2190
           to disk index. Setting this can help control memory usage. A value
2467
           for mail messages which don't specify it. This is mainly useful
2191
           of 0 means no explicit flushing, letting Xapian use its own
2468
           for readpst (libpst) dumps, which are utf-8 but do not say so.
2192
           default, which is flushing every 10000 documents (memory usage
2469
2193
           depends on average document size). The default value is 10.
2470
   localfields
2471
2472
           This allows setting fields for all documents under a given
2473
           directory. Typical usage would be to set an "rclaptg" field, to be
2474
           used in mimeview to select a specific viewer. Ie:
2475
           localfields=rclaptg=gnus;other=val, then select specifier viewer
2476
           with mimetype|tag=... in mimeview.
2194
2477
2195
   filtersdir
2478
   filtersdir
2196
2479
2197
           A directory to search for the external filter scripts used to
2480
           A directory to search for the external filter scripts used to
2198
           index some types of files. The value should not be changed, except
2481
           index some types of files. The value should not be changed, except
...
...
2201
2484
2202
   iconsdir
2485
   iconsdir
2203
2486
2204
           The name of the directory where recoll result list icons are
2487
           The name of the directory where recoll result list icons are
2205
           stored. You can change this if you want different images.
2488
           stored. You can change this if you want different images.
2206
2207
   guesscharset
2208
2209
           Decide if we try to guess the character set of files if no
2210
           internal value is available (ie: for plain text files). This does
2211
           not work well in general, and should probably not be used.
2212
2213
   usesystemfilecommand
2214
2215
           Decide if we use the file -i system command as a final step for
2216
           determining the mime type for a file (the main procedure uses
2217
           suffix associations as defined in the mimemap file). This can be
2218
           useful for files with suffix-less names, but it will also cause
2219
           the indexing of many bogus "text" files.
2220
2221
   indexedmimetypes
2222
2223
           Recoll normally indexes any file which it knows how to read. This
2224
           list lets you restrict the indexed mime types to what you specify.
2225
           If the variable is unspecified or the list empty (the default),
2226
           all supported types are processed.
2227
2228
   compressedfilemaxkbs
2229
2230
           Size limit for compressed (.gz or .bz2) files. These need to be
2231
           decompressed in a temporary directory for identification, which
2232
           can be very wasteful if 'uninteresting' big compressed files are
2233
           present. Negative means no limit, 0 means no processing of any
2234
           compressed file. Defaults to -1.
2235
2236
   indexallfilenames
2237
2238
           Recoll indexes file names in a special section of the database to
2239
           allow specific file names searches using wild cards. This
2240
           parameter decides if file name indexing is performed only for
2241
           files with mime types that would qualify them for full text
2242
           indexing, or for all files inside the selected subtrees,
2243
           independently of mime type.
2244
2489
2245
   idxabsmlen
2490
   idxabsmlen
2246
2491
2247
           Recoll stores an abstract for each indexed file inside the
2492
           Recoll stores an abstract for each indexed file inside the
2248
           database. The text can come from an actual 'abstract' section in
2493
           database. The text can come from an actual 'abstract' section in
...
...
2282
           This lets you adjust the size of n-grams used for indexing CJK
2527
           This lets you adjust the size of n-grams used for indexing CJK
2283
           text. The default value of 2 is probably appropriate in most
2528
           text. The default value of 2 is probably appropriate in most
2284
           cases. A value of 3 would allow more precision and efficiency on
2529
           cases. A value of 3 would allow more precision and efficiency on
2285
           longer words, but the index will be approximately twice as large.
2530
           longer words, but the index will be approximately twice as large.
2286
2531
2532
   guesscharset
2533
2534
           Decide if we try to guess the character set of files if no
2535
           internal value is available (ie: for plain text files). This does
2536
           not work well in general, and should probably not be used.
2537
2287
     ----------------------------------------------------------------------
2538
     ----------------------------------------------------------------------
2288
2539
2289
  7.4.2. The mimemap file
2540
  7.4.2. The mimemap file
2290
2541
2291
   mimemap specifies the file name extension to mime type mappings.
2542
   mimemap specifies the file name extension to mime type mappings.
...
...
2341
   non-default entries, which will override those from the central
2592
   non-default entries, which will override those from the central
2342
   configuration file.
2593
   configuration file.
2343
2594
2344
   Please note that these entries must be placed under a [view] section.
2595
   Please note that these entries must be placed under a [view] section.
2345
2596
2597
   The keys in the file are normally mime types. You can add an application
2598
   tag to specialize the choice for an area of the filesystem (using a
2599
   localfields specification in mimeconf). The syntax for the key is
2600
   mimetype|tag
2601
2346
   If Use desktop preferences to choose document editor is checked in the
2602
   If Use desktop preferences to choose document editor is checked in the
2347
   user preferences, all mimeview entries will be ignored except the one
2603
   user preferences, all mimeview entries will be ignored except the one
2348
   labelled application/x-all (which is set to use xdg-open by default).
2604
   labelled application/x-all (which is set to use xdg-open by default).
2349
2605
2350
     ----------------------------------------------------------------------
2606
     ----------------------------------------------------------------------