Switch to unified view

a/src/README b/src/README
...
...
10
10
11
   Copyright (c) 2005 Jean-Francois Dockes
11
   Copyright (c) 2005 Jean-Francois Dockes
12
12
13
   This document introduces full text search notions and describes the
13
   This document introduces full text search notions and describes the
14
   installation and use of the Recoll application. It currently describes
14
   installation and use of the Recoll application. It currently describes
15
   Recoll 1.9.
15
   Recoll 1.12.
16
17
   [ Split HTML / Single HTML ]
18
16
19
     ----------------------------------------------------------------------
17
     ----------------------------------------------------------------------
20
18
21
   Table of Contents
19
   Table of Contents
22
20
...
...
48
46
49
                             2.4.2. Using cron to automate indexing
47
                             2.4.2. Using cron to automate indexing
50
48
51
                2.5. Real time indexing
49
                2.5. Real time indexing
52
50
53
   3. Searching
51
   3. Searching with the Qt graphical user interface
54
52
55
                3.1. Simple search
53
                3.1. Simple search
56
54
57
                3.2. The result list
55
                3.2. The result list
58
56
...
...
70
68
71
                3.8. Multiple databases
69
                3.8. Multiple databases
72
70
73
                3.9. Document history
71
                3.9. Document history
74
72
75
                3.10. Sorting search results
73
                3.10. Sorting search results and collapsing duplicates
76
74
77
                3.11. Search tips, shortcuts
75
                3.11. Search tips, shortcuts
78
76
79
                             3.11.1. Terms and search expansion
77
                             3.11.1. Terms and search expansion
80
78
...
...
82
80
83
                             3.11.3. Others
81
                             3.11.3. Others
84
82
85
                3.12. Customizing the search interface
83
                3.12. Customizing the search interface
86
84
85
   4. Searching with the KDE KIO slave
86
87
                4.1. What's this
88
89
                4.2. Searchable documents
90
91
   5. Searching on the command line
92
87
   4. Programming interface
93
   6. Programming interface
88
94
89
                4.1. Writing a document filter
95
                6.1. Writing a document filter
90
96
91
                             4.1.1. Filter HTML output
97
                             6.1.1. Filter HTML output
92
98
93
                4.2. Field data processing configuration
99
                6.2. Field data processing configuration
94
100
95
                4.3. API
101
                6.3. API
96
102
97
                             4.3.1. Interface elements
103
                             6.3.1. Interface elements
98
104
99
                             4.3.2. Python interface
105
                             6.3.2. Python interface
100
106
101
   5. Installation
107
   7. Installation
102
108
103
                5.1. Installing a prebuilt copy
109
                7.1. Installing a prebuilt copy
104
110
105
                             5.1.1. Installing through a package system
111
                             7.1.1. Installing through a package system
106
112
107
                             5.1.2. Installing a prebuilt Recoll
113
                             7.1.2. Installing a prebuilt Recoll
108
114
109
                5.2. Supporting packages
115
                7.2. Supporting packages
110
116
111
                5.3. Building from source
117
                7.3. Building from source
112
118
113
                             5.3.1. Prerequisites
119
                             7.3.1. Prerequisites
114
120
115
                             5.3.2. Building
121
                             7.3.2. Building
116
122
117
                             5.3.3. Installation
123
                             7.3.3. Installation
118
124
119
                5.4. Configuration overview
125
                7.4. Configuration overview
120
126
121
                             5.4.1. Main configuration file
127
                             7.4.1. Main configuration file
122
128
123
                             5.4.2. The mimemap file
129
                             7.4.2. The mimemap file
124
130
125
                             5.4.3. The mimeconf file
131
                             7.4.3. The mimeconf file
126
132
127
                             5.4.4. The mimeview file
133
                             7.4.4. The mimeview file
128
134
129
                             5.4.5. Examples of configuration adjustments
135
                             7.4.5. Examples of configuration adjustments
130
136
131
                5.5. The KDE Kicker Recoll applet
137
                7.5. The KDE Kicker Recoll applet
132
138
133
     ----------------------------------------------------------------------
139
     ----------------------------------------------------------------------
134
140
135
                            Chapter 1. Introduction
141
                            Chapter 1. Introduction
136
142
...
...
141
   interface, which will index your home directory by default, allowing you
147
   interface, which will index your home directory by default, allowing you
142
   to search immediately after indexing completes.
148
   to search immediately after indexing completes.
143
149
144
   Do not do this if your home directory contains a huge number of documents
150
   Do not do this if your home directory contains a huge number of documents
145
   and you do not want to wait or are very short on disk space. In this case,
151
   and you do not want to wait or are very short on disk space. In this case,
146
   you may want to edit the configuration file first to restrict the indexed
152
   you may first want to customize the configuration to restrict the indexed
147
   area.
153
   area.
148
154
149
   Also be aware that you may need to install the appropriate supporting
155
   Also be aware that you may need to install the appropriate supporting
150
   applications for document types that need them (for example antiword for
156
   applications for document types that need them (for example antiword for
151
   ms-word files).
157
   ms-word files).
...
...
214
   documents in different languages in the same index is possible, and useful
220
   documents in different languages in the same index is possible, and useful
215
   in practice, but does introduce possibilities of confusion. Recoll
221
   in practice, but does introduce possibilities of confusion. Recoll
216
   currently makes no attempt at automatic language recognition.
222
   currently makes no attempt at automatic language recognition.
217
223
218
   Recoll has many parameters which define exactly what to index, and how to
224
   Recoll has many parameters which define exactly what to index, and how to
219
   classify and decode the source documents. These are kept in a
225
   classify and decode the source documents. These are kept in configuration
220
   configuration file. A default configuration is copied into a standard
226
   files. A default configuration is copied into a standard location (usually
221
   location (usually something like /usr/[local/]share/recoll/examples)
227
   something like /usr/[local/]share/recoll/examples) during installation.
222
   during installation. The default parameters from this file may be
228
   The default parameters from this file may be overridden by values that you
223
   overridden by values that you set inside your personal configuration,
229
   set inside your personal configuration, found by default in the .recoll
224
   found by default in the .recoll sub-directory of your home directory. The
230
   sub-directory of your home directory. The default configuration will index
225
   default configuration will index your home directory with default
231
   your home directory with default parameters and should be sufficient for
226
   parameters and should be sufficient for giving Recoll a try, but you may
232
   giving Recoll a try, but you may want to adjust it later.
227
   want to adjust it later.
228
233
229
   Indexing is started automatically the first time you execute the recoll
234
   Indexing is started automatically the first time you execute the recoll
230
   search graphical user interface, or by executing the recollindex command.
235
   search graphical user interface, or by executing the recollindex command.
231
236
232
   Searches are performed inside the recoll program, which has many options
237
   Searches are performed inside the recoll program, which has many options
...
...
417
422
418
     ----------------------------------------------------------------------
423
     ----------------------------------------------------------------------
419
424
420
  2.3.1. The indexing configuration GUI
425
  2.3.1. The indexing configuration GUI
421
426
422
   As of Recoll 1.10, most parameters for a given indexing configuration can
427
   Most parameters for a given indexing configuration can be set from a
423
   be set from a recoll GUI running on this configuration (either as default,
428
   recoll GUI running on this configuration (either as default, or by setting
424
   or by setting RECOLL_CONFDIR or the -c option.)
429
   RECOLL_CONFDIR or the -c option.)
425
430
426
   The interface is started from the Preferences menu. It has two main
431
   The interface is started from the Preferences menu. It has two main
427
   panels. The first panel allows setting global variables, like the list of
432
   panels. The first panel allows setting global variables, like the list of
428
   top directories or the list of skipped paths. The second panel allows
433
   top directories or the list of skipped paths. The second panel allows
429
   setting variables that can be redefined for subdirectories. This second
434
   setting variables that can be redefined for subdirectories. This second
...
...
531
   it if your system is short on resources. Periodic indexing is adequate in
536
   it if your system is short on resources. Periodic indexing is adequate in
532
   most cases.
537
   most cases.
533
538
534
     ----------------------------------------------------------------------
539
     ----------------------------------------------------------------------
535
540
536
                              Chapter 3. Searching
541
           Chapter 3. Searching with the Qt graphical user interface
537
542
538
   The recoll program provides the user interface for searching. It is based
543
   The recoll program provides the main user interface for searching. It is
539
   on the QT library.
544
   based on the QT library.
540
545
541
   recoll has two search modes:
546
   recoll has two search modes:
542
547
543
     * Simple search (the default, on the main screen) has a single entry
548
     * Simple search (the default, on the main screen) has a single entry
544
       field where you can enter multiple words.
549
       field where you can enter multiple words.
...
...
552
   contain embedded punctuation or other non-textual characters. For exemple,
557
   contain embedded punctuation or other non-textual characters. For exemple,
553
   Recoll can handle things like e-mail addresses, or arbitrary cut and paste
558
   Recoll can handle things like e-mail addresses, or arbitrary cut and paste
554
   from another text window, punctation and all.
559
   from another text window, punctation and all.
555
560
556
   The main case where you should enter text differently from how it is
561
   The main case where you should enter text differently from how it is
557
   printed is for east-oriental languages written with Chinese characters.
562
   printed is for east-asian languages (Chinese, Japanese, Korean). Words
558
   Words composed of single or multiple characters should be entered
563
   composed of single or multiple characters should be entered separated by
559
   separated by white space in this case (they would typically be printed
564
   white space in this case (they would typically be printed without white
560
   without white space).
565
   space).
561
566
562
     ----------------------------------------------------------------------
567
     ----------------------------------------------------------------------
563
568
564
3.1. Simple search
569
3.1. Simple search
565
570
566
    1. Start the recoll program.
571
    1. Start the recoll program.
567
572
568
    2. Possibly choose a search mode: Any term or All terms or File name.
573
    2. Possibly choose a search mode: Any term, All terms, File name or Query
574
       language.
569
575
570
    3. Enter search term(s) in the text field at the top of the window.
576
    3. Enter search term(s) in the text field at the top of the window.
571
577
572
    4. Click the Search button or hit the Enter key to start the search.
578
    4. Click the Search button or hit the Enter key to start the search.
573
579
...
...
577
   the terms appear.
583
   the terms appear.
578
584
579
   File name will specifically look for file names. The entry will be split
585
   File name will specifically look for file names. The entry will be split
580
   at white space characters, and each pattern will be separately expanded.
586
   at white space characters, and each pattern will be separately expanded.
581
   If you want to search for a pattern including white space, you need to use
587
   If you want to search for a pattern including white space, you need to use
582
   double quotes.
588
   double quotes. The point of having a separate file name search is that
589
   wild card expansion can be performed more efficiently on a relatively
590
   small subset of the index.
583
591
584
   The fourth entry (Query Language) is described in its own section.
592
   The fourth entry (Query Language) is described in its own section.
585
593
586
   All search modes allow wildcards inside terms (*, ?, []). You may want to
594
   All search modes allow wildcards inside terms (*, ?, []). You may want to
587
   have a look at the section about wildcards for more information about
595
   have a look at the section about wildcards for more information about
...
...
591
   enclosing the input inside double quotes. Ex: "virtual reality".
599
   enclosing the input inside double quotes. Ex: "virtual reality".
592
600
593
   Character case has no influence on search, except that you can disable
601
   Character case has no influence on search, except that you can disable
594
   stem expansion for any term by capitalizing it. Ie: a search for floor
602
   stem expansion for any term by capitalizing it. Ie: a search for floor
595
   will also normally look for flooring, floored, etc., but a search for
603
   will also normally look for flooring, floored, etc., but a search for
596
   Floor will only look for floor, in any character case (stemming can also
604
   Floor will only look for floor, in any character case. Sstemming can also
597
   be disabled globally in the preferences).
605
   be disabled globally in the preferences.
598
606
599
   Recoll remembers the last few searches that you performed. You can use the
607
   Recoll remembers the last few searches that you performed. You can use the
600
   simple search text entry widget (a combobox) to recall them (click on the
608
   simple search text entry widget (a combobox) to recall them (click on the
601
   thing at the right of the text field). Please note, however, that only the
609
   thing at the right of the text field). Please note, however, that only the
602
   search texts are remembered, not the mode (all/any/file name).
610
   search texts are remembered, not the mode (all/any/file name).
...
...
632
   open tabs in the existing preview window. You can use Shift+Click to force
640
   open tabs in the existing preview window. You can use Shift+Click to force
633
   the creation of another preview window, which may be useful to view the
641
   the creation of another preview window, which may be useful to view the
634
   documents side by side. (You can also browse successive results in a
642
   documents side by side. (You can also browse successive results in a
635
   single preview window by typing Shift+ArrowUp/Down in the window).
643
   single preview window by typing Shift+ArrowUp/Down in the window).
636
644
637
   Clicking the Edit link will attempt to start an external viewer. The
645
   Clicking the Edit link will attempt to start an external editor. The
638
   viewers can be configured through the user preferences dialog, or by
646
   editors can be configured through the user preferences dialog, or by
639
   editing the mimeview configuration file.
647
   editing the mimeview configuration file.
640
648
641
   The Preview and Edit edit links may not be present for all entries,
649
   The Preview and Edit edit links may not be present for all entries,
642
   meaning that Recoll has no configured way to preview a given file type
650
   meaning that Recoll has no configured way to preview a given file type
643
   (which was indexed by name only), or no configured external viewer for the
651
   (which was indexed by name only), or no configured external editor for the
644
   file type. This can sometimes be adjusted simply by tweaking the mimemap
652
   file type. This can sometimes be adjusted simply by tweaking the mimemap
645
   and mimeview configuration files (the latter can be modified with the user
653
   and mimeview configuration files (the latter can be modified with the user
646
   preferences dialog).
654
   preferences dialog).
647
655
656
   The format of the result list entries is entirely configurable by using
657
   the preference dialog to edit an HTML fragment.
658
648
   You can click on the Query details link at the top of the results page to
659
   You can click on the Query details link at the top of the results page to
649
   see the query actually performed, after stem expansion and other
660
   see the query actually performed, after stem expansion and other
650
   processing.
661
   processing.
651
662
652
   Double-clicking on any word inside the result list or a preview window
663
   Double-clicking on any word inside the result list or a preview window
...
...
670
681
671
     * Copy File Name
682
     * Copy File Name
672
683
673
     * Copy Url
684
     * Copy Url
674
685
675
     * Find similar
686
     * Save to File
676
687
677
     * Find similar
688
     * Find similar
678
689
679
     * Parent document
690
     * Parent document
680
691
681
   The Preview and Edit entries do the same thing as the corresponding links.
692
   The Preview and Edit entries do the same thing as the corresponding links.
682
693
683
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
694
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
684
   for later pasting.
695
   for later pasting.
696
697
   Save to File allows saving the contents of a result document to a chosen
698
   file. This entry will only appear if the document does not correspond to
699
   an existing file, but is a subdocument inside such a file (ie: an email
700
   attachment). It is especially useful to extract attachments with no
701
   associated editor.
685
702
686
   The Find similar entry will select a number of relevant term from the
703
   The Find similar entry will select a number of relevant term from the
687
   current document and enter them into the simple search field. You can then
704
   current document and enter them into the simple search field. You can then
688
   start a simple search, with a good chance of finding documents related to
705
   start a simple search, with a good chance of finding documents related to
689
   the current result.
706
   the current result.
...
...
730
   If you have a search string entered and you use ^Up/^Down to browse the
747
   If you have a search string entered and you use ^Up/^Down to browse the
731
   results, the search is initiated for each successive document. If the
748
   results, the search is initiated for each successive document. If the
732
   string is found, the cursor will be positioned at the first occurrence of
749
   string is found, the cursor will be positioned at the first occurrence of
733
   the search string.
750
   the search string.
734
751
752
   A right-click menu in the text area allows switching between displaying
753
   the main text or the contents of fields associated to the document (ie:
754
   author, abtract, etc.). This is especially useful in cases where the term
755
   match did not occur in the main text but in one of the fields.
756
735
     ----------------------------------------------------------------------
757
     ----------------------------------------------------------------------
736
758
737
3.4. The query language
759
3.4. The query language
738
760
739
   The query language processor is activated on the simple search entry when
761
   The query language processor is activated on the simple search entry when
...
...
831
853
832
     ----------------------------------------------------------------------
854
     ----------------------------------------------------------------------
833
855
834
3.5. Complex/advanced search
856
3.5. Complex/advanced search
835
857
836
   The advanced search dialog has a number of fields that will allow a more
858
   The advanced search dialog helps you build more complex queries. It can be
859
   opened through the Tools menu or through the main toolbar.
860
861
   The dialog has three parts:
862
863
     * The top part allows constructing a query by combining multiple clauses
837
   refined search. Each entry field is configurable for the following modes:
864
       of different types. Each entry field is configurable for the following
865
       modes:
838
866
839
     * All terms.
867
          * All terms.
840
868
841
     * Any term.
869
          * Any term.
842
870
843
     * None of the terms.
871
          * None of the terms.
844
872
845
     * Phrase (exact terms in order within an adjustable window).
873
          * Phrase (exact terms in order within an adjustable window).
846
874
847
     * Proximity (terms in any order within an adjustable window).
875
          * Proximity (terms in any order within an adjustable window).
848
876
849
     * Filename search with wildcards.
877
          * Filename search.
850
878
851
   Additional entry fields can be created by clicking the Add clause button.
879
       Additional entry fields can be created by clicking the Add clause
880
       button.
852
881
853
   You can choose that all relevant fields will be combined by either an AND
882
       When searching, the non-empty clauses will be combined either with an
854
   or an OR conjunction. All types of clauses except "phrase" and "near" can
883
       AND or an OR conjunction, depending on the choice made on the left
855
   accept a mix of single words and phrases enclosed in double quotes.
884
       (All clauses or Any clause).
856
   Stemming expansion will be performed for all terms not beginning with a
857
   capital letter, except for terms inside "phrase" clauses. Wildcards will
858
   be processed everywhere.
859
885
860
   Advanced search will also let you search for documents of specific mime
886
       Entries of all types except "Phrase" and "Near" accept a mix of single
861
   types (ie: only text/plain, or text/HTML or application/pdf etc...). The
887
       words and phrases enclosed in double quotes. Stemming and wildcard
888
       expansion will be performed as for simple search.
889
890
     * The next part allows filtering the results by their mime types.
891
862
   state of the file type selection can be saved as the default (the file
892
       The state of the file type selection can be saved as the default (the
863
   type filter will not be activated at program start-up, but the lists will
893
       file type filter will not be activated at program start-up, but the
864
   be in the restored state).
894
       lists will be in the restored state).
865
895
866
   You can also restrict the search results to a sub-tree of the indexed
896
     * The bottom part allows restricting the search results to a sub-tree of
867
   area. If you need to do this often, you may think of setting up multiple
897
       the indexed area. If you need to do this often, you may think of
868
   indexes instead, as the performance will be much better.
898
       setting up multiple indexes instead, as the performance will be much
899
       better.
900
901
   Phrases and Proximity searches. These two clauses work in similar ways,
902
   with the difference that proximity searches do not impose an order on the
903
   words. In both cases, an adjustable number (slack) of non-matched words
904
   may be accepted between the searched ones (use the counter on the left to
905
   adjust this count). For phrases, the default count is zero (exact match).
906
   For proximity it is ten (meaning that two search terms, would be matched
907
   if found within a window of twelve words). Examples: a phrase search for
908
   quick fox with a slack of 0 will match quick fox but not quick brown fox.
909
   With a slack of 1 it will match the latter, but not fox quick. A proximity
910
   search for quick fox with the default slack will match the latter, and
911
   also a fox is a cunning and quick animal.
869
912
870
   Click on the Start Search button in the advanced search dialog, or type
913
   Click on the Start Search button in the advanced search dialog, or type
871
   Enter in any text field to start the search. The button in the main window
914
   Enter in any text field to start the search. The button in the main window
872
   always performs a simple search.
915
   always performs a simple search.
873
916
...
...
1018
   You can erase the document history by using the Erase document history
1061
   You can erase the document history by using the Erase document history
1019
   entry in the File menu.
1062
   entry in the File menu.
1020
1063
1021
     ----------------------------------------------------------------------
1064
     ----------------------------------------------------------------------
1022
1065
1023
3.10. Sorting search results
1066
3.10. Sorting search results and collapsing duplicates
1024
1067
1025
   The documents in a result list are normally sorted in order of relevance.
1068
   The documents in a result list are normally sorted in order of relevance.
1026
   It is possible to specify different sort parameters by using the Sort
1069
   It is possible to specify different sort parameters by using the Sort
1027
   parameters dialog (located in the Tools menu).
1070
   parameters dialog (located in the Tools menu).
1028
1071
...
...
1035
1078
1036
   Sort parameters are remembered between program invocations, but result
1079
   Sort parameters are remembered between program invocations, but result
1037
   sorting is normally always inactive when the program starts. It is
1080
   sorting is normally always inactive when the program starts. It is
1038
   possible to keep the sorting activation state between program invocations
1081
   possible to keep the sorting activation state between program invocations
1039
   by checking the Remember sort activation state option in the preferences.
1082
   by checking the Remember sort activation state option in the preferences.
1083
1084
   It is also possible to hide duplicate entries inside the result list
1085
   (documents with the exact same contents as the displayed one). The test of
1086
   identity is based on an MD5 hash of the document container, not only of
1087
   the text contents (so that ie, a text document with an image added will
1088
   not be a duplicate of the text only). Duplicates hiding is controlled by
1089
   an entry in the Query configuration dialog, and is off by default.
1040
1090
1041
     ----------------------------------------------------------------------
1091
     ----------------------------------------------------------------------
1042
1092
1043
3.11. Search tips, shortcuts
1093
3.11. Search tips, shortcuts
1044
1094
...
...
1079
1129
1080
  3.11.2. Working with phrases and proximity
1130
  3.11.2. Working with phrases and proximity
1081
1131
1082
   Phrases and Proximity searches. A phrase can be looked for by enclosing it
1132
   Phrases and Proximity searches. A phrase can be looked for by enclosing it
1083
   in double quotes. Example: "user manual" will look only for occurrences of
1133
   in double quotes. Example: "user manual" will look only for occurrences of
1084
   user immediately followed by manual. You can use the This exact phrase
1134
   user immediately followed by manual. You can use the This phrase field of
1085
   field of the advanced search dialog to the same effect. Phrases can be
1135
   the advanced search dialog to the same effect. Phrases can be entered
1086
   entered along simple terms in all simple or advanced search entry fields
1136
   along simple terms in all simple or advanced search entry fields (except
1087
   (except This exact phrase).
1137
   This exact phrase).
1088
1138
1089
   AutoPhrases. This option can be set in the preferences dialog. If it is
1139
   AutoPhrases. This option can be set in the preferences dialog. If it is
1090
   set, a phrase will be automatically built and added to simple searches
1140
   set, a phrase will be automatically built and added to simple searches
1091
   when looking for Any terms. This will not change radically the results,
1141
   when looking for Any terms. This will not change radically the results,
1092
   but will give a relevance boost to the results where the search terms
1142
   but will give a relevance boost to the results where the search terms
...
...
1134
1184
1135
   User interface parameters:
1185
   User interface parameters:
1136
1186
1137
     * Number of results in a result page:
1187
     * Number of results in a result page:
1138
1188
1189
     * Hide duplicate results: decides if result list entries are shown for
1190
       identical documents found in different places.
1191
1139
     * Highlight color for query terms: Terms from the user query are
1192
     * Highlight color for query terms: Terms from the user query are
1140
       highlighted in the result list samples and the preview window. The
1193
       highlighted in the result list samples and the preview window. The
1141
       color can be chosen here. Any QT color string should work (ie red,
1194
       color can be chosen here. Any QT color string should work (ie red,
1142
       #ff0000). The default is blue.
1195
       #ff0000). The default is blue.
1143
1196
...
...
1265
   alternative indexer may also need to implement a way of purging the index
1318
   alternative indexer may also need to implement a way of purging the index
1266
   from stale data,
1319
   from stale data,
1267
1320
1268
     ----------------------------------------------------------------------
1321
     ----------------------------------------------------------------------
1269
1322
1323
                  Chapter 4. Searching with the KDE KIO slave
1324
1325
4.1. What's this
1326
1327
   The Recoll KIO slave allows performing a Recoll search by entering an
1328
   appropriate URL in a KDE open dialog, or with an HTML-based interface
1329
   displayed in Konqueror.
1330
1331
   The HTML-based interface is similar to the QT-based interface, but
1332
   slightly less powerful for now. Its advantage is that you can perform your
1333
   search while staying fully within the KDE framework: drag and drop from
1334
   the result list works normally and you have your normal choice of
1335
   applications for opening files.
1336
1337
   The alternative interface uses a directory view of search results. Due to
1338
   limitations in the current KIO slave interface, it is currently not
1339
   obviously useful (to me).
1340
1341
   The interface is described in more detail inside a help file which you can
1342
   access by entering recoll:/ inside the konqueror URL line (this works only
1343
   if the recoll KIO slave has been previously installed).
1344
1345
   The instructions for building this module are located in the source tree.
1346
   See: kde/kio/recoll/00README.txt
1347
1348
     ----------------------------------------------------------------------
1349
1350
4.2. Searchable documents
1351
1352
   As a sample application, the Recoll KIO slave could allow preparing a set
1353
   of HTML documents (for example a manual) so that they become their own
1354
   search interface inside konqueror.
1355
1356
   This can be done by either explicitely inserting <a href="recoll:/...">
1357
   links around some document areas, or automatically by adding a very small
1358
   javascript program to the documents, like the following example, which
1359
   would initiate a search by double-clicking any term:
1360
1361
 <script language="JavaScript">
1362
     function recollsearch() {
1363
         var t = document.getSelection();
1364
         window.location.href = 'recoll://search/query?qtp=a&p=0&q=' +
1365
             encodeURIComponent(t);
1366
     }
1367
 </script>
1368
  ....
1369
 <body ondblclick="recollsearch()">
1370
1371
     ----------------------------------------------------------------------
1372
1373
                    Chapter 5. Searching on the command line
1374
1375
   There are several ways to obtain search results as a text stream, without
1376
   a graphical interface:
1377
1378
     * By passing option -t to the recoll program.
1379
1380
     * By using the recollq program.
1381
1382
     * By writing a custom Python program, using the Recoll Python API.
1383
1384
   The first two methods work in the same way and accept/need the same
1385
   arguments (except for the additional -t to recoll). The query to be
1386
   executed is specified as command line arguments.
1387
1388
   recollq is not built by default. You can use the Makefile in the query
1389
   directory to build it. This is a very simple program, and it will often be
1390
   useful to taylor its output format to your needs.
1391
1392
   recollq has a man page (not installed by default, look in the doc/man
1393
   directory). The Usage string is as follows:
1394
1395
 recollq [-o|-a|-f] <query string>
1396
  Runs a recoll query and displays result lines.
1397
   Default: will interpret the argument(s) as a query language string
1398
   -o Emulate the gui simple search in ANY TERM mode
1399
   -a Emulate the gui simple search in ALL TERMS mode
1400
   -f Emulate the gui simple search in filename mode
1401
 Common options:
1402
     -c <configdir> : specify config directory, overriding $RECOLL_CONFDIR
1403
     -d also dump file contents
1404
     -n <cnt> limit the maximum number of results (0->no limit, default 2000)
1405
     -b : basic. Just output urls, no mime types or titles
1406
     -m : dump the whole document meta[] array
1407
     -S fld : sort by field name
1408
     -D : sort descending
1409
1410
   Sample execution:
1411
1412
 recollq 'ilur -nautique mime:text/html'
1413
 Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
1414
   OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
1415
 4 results
1416
 text/html   [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html]  [comptes.html]  18593   bytes  
1417
 text/html   [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
1418
 text/html   [file:///Users/uncrypted-dockes/projets/pagepers/index.html]    [psxtcl/writemime/recoll]...
1419
 text/html   [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
1420
1421
     ----------------------------------------------------------------------
1422
1270
                        Chapter 4. Programming interface
1423
                        Chapter 6. Programming interface
1271
1424
1272
   Recoll has an Application programming Interface, usable both for indexing
1425
   Recoll has an Application programming Interface, usable both for indexing
1273
   and searching, currently accessible from the Python language.
1426
   and searching, currently accessible from the Python language.
1274
1427
1275
   Another less radical way to extend the application is to write filters for
1428
   Another less radical way to extend the application is to write filters for
...
...
1278
   The processing of metadata attributes for documents (fields) is highly
1431
   The processing of metadata attributes for documents (fields) is highly
1279
   configurable.
1432
   configurable.
1280
1433
1281
     ----------------------------------------------------------------------
1434
     ----------------------------------------------------------------------
1282
1435
1283
4.1. Writing a document filter
1436
6.1. Writing a document filter
1284
1437
1285
   Recoll filters are executable programs which translate from a specific
1438
   Recoll filters are executable programs which translate from a specific
1286
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
1439
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
1287
   format, which may be text/plain or text/html.
1440
   format, which may be text/plain or text/html.
1288
1441
...
...
1332
   cannot specify the character set and other metadata, so they are limited
1485
   cannot specify the character set and other metadata, so they are limited
1333
   to cases where these elements are not needed.
1486
   to cases where these elements are not needed.
1334
1487
1335
     ----------------------------------------------------------------------
1488
     ----------------------------------------------------------------------
1336
1489
1337
  4.1.1. Filter HTML output
1490
  6.1.1. Filter HTML output
1338
1491
1339
   The output HTML could be very minimal like the following example:
1492
   The output HTML could be very minimal like the following example:
1340
1493
1341
 <html><head>
1494
 <html><head>
1342
 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
1495
 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
...
...
1365
   See the following section for details about configuring how field data is
1518
   See the following section for details about configuring how field data is
1366
   processed by the indexer.
1519
   processed by the indexer.
1367
1520
1368
     ----------------------------------------------------------------------
1521
     ----------------------------------------------------------------------
1369
1522
1370
4.2. Field data processing configuration
1523
6.2. Field data processing configuration
1371
1524
1372
   Fields are named pieces of information in or about documents, like title,
1525
   Fields are named pieces of information in or about documents, like title,
1373
   author, abstract.
1526
   author, abstract.
1374
1527
1375
   The field values for documents can appear in several ways during indexing:
1528
   The field values for documents can appear in several ways during indexing:
...
...
1400
   A field becomes stored by appearing in the [stored] section of the fields
1553
   A field becomes stored by appearing in the [stored] section of the fields
1401
   file.
1554
   file.
1402
1555
1403
     ----------------------------------------------------------------------
1556
     ----------------------------------------------------------------------
1404
1557
1405
4.3. API
1558
6.3. API
1406
1559
1407
  4.3.1. Interface elements
1560
  6.3.1. Interface elements
1408
1561
1409
   A few elements in the interface are specific and and need an explanation.
1562
   A few elements in the interface are specific and and need an explanation.
1410
1563
1411
   udi
1564
   udi
1412
1565
...
...
1443
   during indexing. The main indexer documents would also probably be a
1596
   during indexing. The main indexer documents would also probably be a
1444
   problem for the external indexer purge operation.
1597
   problem for the external indexer purge operation.
1445
1598
1446
     ----------------------------------------------------------------------
1599
     ----------------------------------------------------------------------
1447
1600
1448
  4.3.2. Python interface
1601
  6.3.2. Python interface
1449
1602
1450
    4.3.2.1. Introduction
1603
    6.3.2.1. Introduction
1451
1604
1452
   Recoll versions after 1.11 define a Python programming interface, both for
1605
   Recoll versions after 1.11 define a Python programming interface, both for
1453
   searching and indexing.
1606
   searching and indexing.
1454
1607
1455
   The python interface is not built by default and can be found in the
1608
   The python interface is not built by default and can be found in the
...
...
1461
         python setup.py install
1614
         python setup.py install
1462
     
1615
     
1463
1616
1464
     ----------------------------------------------------------------------
1617
     ----------------------------------------------------------------------
1465
1618
1466
    4.3.2.2. Interface manual
1619
    6.3.2.2. Interface manual
1467
1620
1468
   NAME
1621
   NAME
1469
       recoll - This is an interface to the Recoll full text indexer.
1622
       recoll - This is an interface to the Recoll full text indexer.
1470
1623
1471
   FILE
1624
   FILE
...
...
1651
1804
1652
   
1805
   
1653
1806
1654
     ----------------------------------------------------------------------
1807
     ----------------------------------------------------------------------
1655
1808
1656
    4.3.2.3. Example code
1809
    6.3.2.3. Example code
1657
1810
1658
   The following sample would query the index with a user language string.
1811
   The following sample would query the index with a user language string.
1659
   See the python/samples directory inside the Recoll source for other
1812
   See the python/samples directory inside the Recoll source for other
1660
   examples.
1813
   examples.
1661
1814
...
...
1682
1835
1683
 
1836
 
1684
1837
1685
     ----------------------------------------------------------------------
1838
     ----------------------------------------------------------------------
1686
1839
1687
                            Chapter 5. Installation
1840
                            Chapter 7. Installation
1688
1841
1689
5.1. Installing a prebuilt copy
1842
7.1. Installing a prebuilt copy
1690
1843
1691
   Recoll binary packages from the Recoll web site are always linked
1844
   Recoll binary packages from the Recoll web site are always linked
1692
   statically to the Xapian libraries, and have no other dependencies. You
1845
   statically to the Xapian libraries, and have no other dependencies. You
1693
   will only have to check or install supporting applications for the file
1846
   will only have to check or install supporting applications for the file
1694
   types that you want to index beyond text, HTML and mail files, and maybe
1847
   types that you want to index beyond text, HTML and mail files, and maybe
1695
   have a look at the configuration section (but this may not be necessary
1848
   have a look at the configuration section (but this may not be necessary
1696
   for a quick test with default parameters).
1849
   for a quick test with default parameters).
1697
1850
1698
     ----------------------------------------------------------------------
1851
     ----------------------------------------------------------------------
1699
1852
1700
  5.1.1. Installing through a package system
1853
  7.1.1. Installing through a package system
1701
1854
1702
   If you use a BSD-type port system or a prebuilt package (RPM or other),
1855
   If you use a BSD-type port system or a prebuilt package (RPM or other),
1703
   just follow the usual procedure for your system.
1856
   just follow the usual procedure for your system.
1704
1857
1705
     ----------------------------------------------------------------------
1858
     ----------------------------------------------------------------------
1706
1859
1707
  5.1.2. Installing a prebuilt Recoll
1860
  7.1.2. Installing a prebuilt Recoll
1708
1861
1709
   The unpackaged binary versions on the Recoll web site are just compressed
1862
   The unpackaged binary versions on the Recoll web site are just compressed
1710
   tar files of a build tree, where only the useful parts were kept
1863
   tar files of a build tree, where only the useful parts were kept
1711
   (executables and sample configuration).
1864
   (executables and sample configuration).
1712
1865
...
...
1717
   had built the package from source (that is, just type make install). The
1870
   had built the package from source (that is, just type make install). The
1718
   binary trees are built for installation to /usr/local.
1871
   binary trees are built for installation to /usr/local.
1719
1872
1720
     ----------------------------------------------------------------------
1873
     ----------------------------------------------------------------------
1721
1874
1722
5.2. Supporting packages
1875
7.2. Supporting packages
1723
1876
1724
   Recoll uses external applications to index some file types. You need to
1877
   Recoll uses external applications to index some file types. You need to
1725
   install them for the file types that you wish to have indexed (these are
1878
   install them for the file types that you wish to have indexed (these are
1726
   run-time dependencies. None is needed for building Recoll).
1879
   run-time dependencies. None is needed for building Recoll).
1727
1880
...
...
1765
   Text, HTML, mail folders Openoffice and Scribus files are processed
1918
   Text, HTML, mail folders Openoffice and Scribus files are processed
1766
   internally. Lyx is used to index Lyx files. Many filters need sed and awk.
1919
   internally. Lyx is used to index Lyx files. Many filters need sed and awk.
1767
1920
1768
     ----------------------------------------------------------------------
1921
     ----------------------------------------------------------------------
1769
1922
1770
5.3. Building from source
1923
7.3. Building from source
1771
1924
1772
  5.3.1. Prerequisites
1925
  7.3.1. Prerequisites
1773
1926
1774
   At the very least, you will need to download and install the xapian core
1927
   At the very least, you will need to download and install the xapian core
1775
   package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
1928
   package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
1776
   version will work too), and the qt run-time and development packages
1929
   version will work too), and the qt run-time and development packages
1777
   (Recoll development currently uses version 3.3.5, but any 3.3 version is
1930
   (Recoll development currently uses version 3.3.5, but any 3.3 version is
...
...
1785
   not be critical). On Linux systems, the iconv interface is part of libc
1938
   not be critical). On Linux systems, the iconv interface is part of libc
1786
   and you should not need to do anything special.
1939
   and you should not need to do anything special.
1787
1940
1788
     ----------------------------------------------------------------------
1941
     ----------------------------------------------------------------------
1789
1942
1790
  5.3.2. Building
1943
  7.3.2. Building
1791
1944
1792
   Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
1945
   Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
1793
   3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
1946
   3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
1794
   system, and need to modify things, I would very much welcome patches.
1947
   system, and need to modify things, I would very much welcome patches.
1795
1948
...
...
1825
   manually copy and modify one of the existing files (the new file name
1978
   manually copy and modify one of the existing files (the new file name
1826
   should be the output of uname -s).
1979
   should be the output of uname -s).
1827
1980
1828
     ----------------------------------------------------------------------
1981
     ----------------------------------------------------------------------
1829
1982
1830
  5.3.3. Installation
1983
  7.3.3. Installation
1831
1984
1832
   Either type make install or execute recollinstall prefix, in the root of
1985
   Either type make install or execute recollinstall prefix, in the root of
1833
   the source tree. This will copy the commands to prefix/bin and the sample
1986
   the source tree. This will copy the commands to prefix/bin and the sample
1834
   configuration files, scripts and other shared data to prefix/share/recoll.
1987
   configuration files, scripts and other shared data to prefix/share/recoll.
1835
1988
...
...
1840
1993
1841
   You can then proceed to configuration.
1994
   You can then proceed to configuration.
1842
1995
1843
     ----------------------------------------------------------------------
1996
     ----------------------------------------------------------------------
1844
1997
1845
5.4. Configuration overview
1998
7.4. Configuration overview
1846
1999
1847
   Most of the parameters specific to the recoll GUI are set through the
2000
   Most of the parameters specific to the recoll GUI are set through the
1848
   Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
2001
   Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
1849
   You probably do not want to edit this by hand.
2002
   You probably do not want to edit this by hand.
1850
2003
1851
   For other options, Recoll uses text configuration files. You will have to
2004
   Recoll indexing options are set inside text configuration files located in
1852
   edit them by hand for now (there is still some hope for a GUI
2005
   a configuration directory. There can be several such directories, each of
1853
   configuration tool in the future). The most accurate documentation for the
2006
   which define the parameters for one index.
1854
   configuration parameters is given by comments inside the default files,
1855
   and we will just give a general overview here.
1856
2007
1857
   There are two sets of configuration files. The system-wide files are kept
2008
   The configuration files can be edited by hand or through the Indexing
1858
   in a directory named like /usr/[local/]share/recoll/examples, they define
2009
   configuration dialog (Preferences menu). The GUI tool will try to respect
1859
   default values for the system. A parallel set of files exists by default
2010
   your formatting and comments as much as possible, so it is quite possible
1860
   in the .recoll directory in your home. This directory can be changed with
2011
   to use both ways.
2012
2013
   The most accurate documentation for the configuration parameters is given
2014
   by comments inside the default files, and we will just give a general
2015
   overview here.
2016
2017
   For each index, there are two sets of configuration files. System-wide
2018
   configuration files are kept in a directory named like
2019
   /usr/[local/]share/recoll/examples, and define default values, shared by
2020
   all indexes. For each index, a parallel set of files defines the
2021
   customized parameters.
2022
2023
   The default location of the configuration is the .recoll directory in your
2024
   home. Most people will only use this directory.
2025
2026
   This location can be changed, or others can be added with the
1861
   the RECOLL_CONFDIR environment variable or the -c option parameter to
2027
   RECOLL_CONFDIR environment variable or the -c option parameter to recoll
1862
   recoll and recollindex.
2028
   and recollindex.
1863
2029
1864
   If the .recoll directory does not exist when recoll or recollindex are
2030
   If the .recoll directory does not exist when recoll or recollindex are
1865
   started, it will be created with a set of empty configuration files.
2031
   started, it will be created with a set of empty configuration files.
1866
   recoll will give you a chance to edit the configuration file before
2032
   recoll will give you a chance to edit the configuration file before
1867
   starting indexing. recollindex will proceed immediately. To avoid
2033
   starting indexing. recollindex will proceed immediately. To avoid
...
...
1900
   White space is used for separation inside lists. List elements with
2066
   White space is used for separation inside lists. List elements with
1901
   embedded spaces can be quoted using double-quotes.
2067
   embedded spaces can be quoted using double-quotes.
1902
2068
1903
     ----------------------------------------------------------------------
2069
     ----------------------------------------------------------------------
1904
2070
1905
  5.4.1. Main configuration file
2071
  7.4.1. Main configuration file
1906
2072
1907
   recoll.conf is the main configuration file. It defines things like what to
2073
   recoll.conf is the main configuration file. It defines things like what to
1908
   index (top directories and things to ignore), and the default character
2074
   index (top directories and things to ignore), and the default character
1909
   set to use for document types which do not specify it internally.
2075
   set to use for document types which do not specify it internally.
1910
2076
...
...
2056
2222
2057
           Recoll normally indexes any file which it knows how to read. This
2223
           Recoll normally indexes any file which it knows how to read. This
2058
           list lets you restrict the indexed mime types to what you specify.
2224
           list lets you restrict the indexed mime types to what you specify.
2059
           If the variable is unspecified or the list empty (the default),
2225
           If the variable is unspecified or the list empty (the default),
2060
           all supported types are processed.
2226
           all supported types are processed.
2227
2228
   compressedfilemaxkbs
2229
2230
           Size limit for compressed (.gz or .bz2) files. These need to be
2231
           decompressed in a temporary directory for identification, which
2232
           can be very wasteful if 'uninteresting' big compressed files are
2233
           present. Negative means no limit, 0 means no processing of any
2234
           compressed file. Defaults to -1.
2061
2235
2062
   indexallfilenames
2236
   indexallfilenames
2063
2237
2064
           Recoll indexes file names in a special section of the database to
2238
           Recoll indexes file names in a special section of the database to
2065
           allow specific file names searches using wild cards. This
2239
           allow specific file names searches using wild cards. This
...
...
2110
           cases. A value of 3 would allow more precision and efficiency on
2284
           cases. A value of 3 would allow more precision and efficiency on
2111
           longer words, but the index will be approximately twice as large.
2285
           longer words, but the index will be approximately twice as large.
2112
2286
2113
     ----------------------------------------------------------------------
2287
     ----------------------------------------------------------------------
2114
2288
2115
  5.4.2. The mimemap file
2289
  7.4.2. The mimemap file
2116
2290
2117
   mimemap specifies the file name extension to mime type mappings.
2291
   mimemap specifies the file name extension to mime type mappings.
2118
2292
2119
   For file names without an extension, or with an unknown one, the system's
2293
   For file names without an extension, or with an unknown one, the system's
2120
   file -i command will be executed to determine the mime type (this can be
2294
   file -i command will be executed to determine the mime type (this can be
...
...
2136
   given Recoll version. Having it there avoids cluttering the more
2310
   given Recoll version. Having it there avoids cluttering the more
2137
   user-oriented and locally customized skippedNames.
2311
   user-oriented and locally customized skippedNames.
2138
2312
2139
     ----------------------------------------------------------------------
2313
     ----------------------------------------------------------------------
2140
2314
2141
  5.4.3. The mimeconf file
2315
  7.4.3. The mimeconf file
2142
2316
2143
   mimeconf specifies how the different mime types are handled for indexing,
2317
   mimeconf specifies how the different mime types are handled for indexing,
2144
   and which icons are displayed in the recoll result lists.
2318
   and which icons are displayed in the recoll result lists.
2145
2319
2146
   Changing the parameters in the [index] section is probably not a good idea
2320
   Changing the parameters in the [index] section is probably not a good idea
...
...
2150
   recoll in the result lists (the values are the basenames of the png images
2324
   recoll in the result lists (the values are the basenames of the png images
2151
   inside the iconsdir directory (specified in recoll.conf).
2325
   inside the iconsdir directory (specified in recoll.conf).
2152
2326
2153
     ----------------------------------------------------------------------
2327
     ----------------------------------------------------------------------
2154
2328
2155
  5.4.4. The mimeview file
2329
  7.4.4. The mimeview file
2156
2330
2157
   mimeview specifies which programs are started when you click on an Edit
2331
   mimeview specifies which programs are started when you click on an Edit
2158
   link in a result list. Ie: HTML is normally displayed using firefox, but
2332
   link in a result list. Ie: HTML is normally displayed using firefox, but
2159
   you may prefer Konqueror, your openoffice.org program might be named
2333
   you may prefer Konqueror, your openoffice.org program might be named
2160
   oofice instead of openoffice etc.
2334
   oofice instead of openoffice etc.
...
...
2173
   user preferences, all mimeview entries will be ignored except the one
2347
   user preferences, all mimeview entries will be ignored except the one
2174
   labelled application/x-all (which is set to use xdg-open by default).
2348
   labelled application/x-all (which is set to use xdg-open by default).
2175
2349
2176
     ----------------------------------------------------------------------
2350
     ----------------------------------------------------------------------
2177
2351
2178
  5.4.5. Examples of configuration adjustments
2352
  7.4.5. Examples of configuration adjustments
2179
2353
2180
    5.4.5.1. Adding an external viewer for an non-indexed type
2354
    7.4.5.1. Adding an external viewer for an non-indexed type
2181
2355
2182
   Imagine that you have some kind of file which does not have indexable
2356
   Imagine that you have some kind of file which does not have indexable
2183
   content, but for which you would like to have a functional Edit link in
2357
   content, but for which you would like to have a functional Edit link in
2184
   the result list (when found by file name). The file names end in .blob and
2358
   the result list (when found by file name). The file names end in .blob and
2185
   can be displayed by application blobviewer.
2359
   can be displayed by application blobviewer.
...
...
2208
   The entries you add in your personal file override those in the central
2382
   The entries you add in your personal file override those in the central
2209
   configuration, which you do not need to alter
2383
   configuration, which you do not need to alter
2210
2384
2211
     ----------------------------------------------------------------------
2385
     ----------------------------------------------------------------------
2212
2386
2213
    5.4.5.2. Adding indexing support for a new file type
2387
    7.4.5.2. Adding indexing support for a new file type
2214
2388
2215
   Let us now imagine that the above .blob files actually contain indexable
2389
   Let us now imagine that the above .blob files actually contain indexable
2216
   text and that you know how to extract it with a command line program.
2390
   text and that you know how to extract it with a command line program.
2217
   Getting Recoll to index the files is easy. You need to perform the above
2391
   Getting Recoll to index the files is easy. You need to perform the above
2218
   alteration, and also to add data to the mimeconf file (typically in
2392
   alteration, and also to add data to the mimeconf file (typically in
...
...
2239
   The filter programming section describes in more detail how to write a
2413
   The filter programming section describes in more detail how to write a
2240
   filter.
2414
   filter.
2241
2415
2242
     ----------------------------------------------------------------------
2416
     ----------------------------------------------------------------------
2243
2417
2244
5.5. The KDE Kicker Recoll applet
2418
7.5. The KDE Kicker Recoll applet
2245
2419
2246
   The Recoll source tree contains the source code to the recoll_applet, a
2420
   The Recoll source tree contains the source code to the recoll_applet, a
2247
   small application derived from the find_applet. This can be used to add a
2421
   small application derived from the find_applet. This can be used to add a
2248
   small Recoll launcher to the KDE panel.
2422
   small Recoll launcher to the KDE panel.
2249
2423