Switch to unified view

a/src/README b/src/README
...
...
10
10
11
   Copyright (c) 2005 Jean-Francois Dockes
11
   Copyright (c) 2005 Jean-Francois Dockes
12
12
13
   This document introduces full text search notions and describes the
13
   This document introduces full text search notions and describes the
14
   installation and use of the Recoll application. It currently describes
14
   installation and use of the Recoll application. It currently describes
15
   Recoll 1.12-1.13.
15
   Recoll 1.14.
16
16
17
   [ Split HTML / Single HTML ]
17
   [ Split HTML / Single HTML ]
18
18
19
     ----------------------------------------------------------------------
19
     ----------------------------------------------------------------------
20
20
...
...
50
50
51
                             2.5.2. Using cron to automate indexing
51
                             2.5.2. Using cron to automate indexing
52
52
53
                2.6. Real time indexing
53
                2.6. Real time indexing
54
54
55
   3. Searching
56
55
   3. Searching with the Qt graphical user interface
57
                3.1. Searching with the Qt graphical user interface
56
58
57
                3.1. Simple search
59
                             3.1.1. Simple search
58
60
59
                3.2. The result list
61
                             3.1.2. The result list
60
62
61
                             3.2.1. The result list right-click menu
62
63
                3.3. The preview window
63
                             3.1.3. The preview window
64
65
                             3.1.4. Complex/advanced search
66
67
                             3.1.5. The term explorer tool
68
69
                             3.1.6. Multiple databases
70
71
                             3.1.7. Document history
72
73
                             3.1.8. Sorting search results and collapsing
74
                             duplicates
75
76
                             3.1.9. Search tips, shortcuts
77
78
                             3.1.10. Customizing the search interface
79
80
                3.2. Searching with the KDE KIO slave
81
82
                             3.2.1. What's this
83
84
                             3.2.2. Searchable documents
85
86
                3.3. Searching on the command line
64
87
65
                3.4. The query language
88
                3.4. The query language
66
89
67
                3.5. Complex/advanced search
68
69
                3.6. The term explorer tool
70
71
                3.7. More about wildcards
90
                             3.4.1. More about wildcards
72
91
73
                3.8. Multiple databases
92
                3.5. Desktop integration
74
93
75
                3.9. Document history
76
77
                3.10. Sorting search results and collapsing duplicates
78
79
                3.11. Search tips, shortcuts
80
81
                             3.11.1. Terms and search expansion
82
83
                             3.11.2. Working with phrases and proximity
84
85
                             3.11.3. Others
94
                             3.5.1. Hotkeying recoll
86
95
87
                3.12. Customizing the search interface
96
                             3.5.2. The KDE Kicker Recoll applet
88
97
89
                             3.12.1. The result list paragraph format
90
91
   4. Searching with the KDE KIO slave
92
93
                4.1. What's this
94
95
                4.2. Searchable documents
96
97
   5. Searching on the command line
98
99
   6. Programming interface
98
   4. Programming interface
100
99
101
                6.1. Writing a document filter
100
                4.1. Writing a document filter
102
101
103
                             6.1.1. Filter HTML output
102
                             4.1.1. Filter HTML output
104
103
105
                6.2. Field data processing
104
                4.2. Field data processing
106
105
107
                6.3. API
106
                4.3. API
108
107
109
                             6.3.1. Interface elements
108
                             4.3.1. Interface elements
110
109
111
                             6.3.2. Python interface
110
                             4.3.2. Python interface
112
111
113
   7. Installation
112
   5. Installation and configuration
114
113
115
                7.1. Installing a binary copy
114
                5.1. Installing a binary copy
116
115
117
                             7.1.1. Installing through a package system
116
                             5.1.1. Installing through a package system
118
117
119
                             7.1.2. Installing a prebuilt Recoll
118
                             5.1.2. Installing a prebuilt Recoll
120
119
121
                7.2. Supporting packages
120
                5.2. Supporting packages
122
121
123
                7.3. Building from source
122
                5.3. Building from source
124
123
125
                             7.3.1. Prerequisites
124
                             5.3.1. Prerequisites
126
125
127
                             7.3.2. Building
126
                             5.3.2. Building
128
127
129
                             7.3.3. Installation
128
                             5.3.3. Installation
130
129
131
                7.4. Configuration overview
130
                5.4. Configuration overview
132
131
133
                             7.4.1. Main configuration file
132
                             5.4.1. Main configuration file
134
133
135
                             7.4.2. The fields file
134
                             5.4.2. The fields file
136
135
137
                             7.4.3. The mimemap file
136
                             5.4.3. The mimemap file
138
137
139
                             7.4.4. The mimeconf file
138
                             5.4.4. The mimeconf file
140
139
141
                             7.4.5. The mimeview file
140
                             5.4.5. The mimeview file
142
141
143
                             7.4.6. Examples of configuration adjustments
142
                             5.4.6. Examples of configuration adjustments
144
145
                7.5. The KDE Kicker Recoll applet
146
143
147
     ----------------------------------------------------------------------
144
     ----------------------------------------------------------------------
148
145
149
                            Chapter 1. Introduction
146
                            Chapter 1. Introduction
150
147
...
...
578
   it if your system is short on resources. Periodic indexing is adequate in
575
   it if your system is short on resources. Periodic indexing is adequate in
579
   most cases.
576
   most cases.
580
577
581
     ----------------------------------------------------------------------
578
     ----------------------------------------------------------------------
582
579
580
                              Chapter 3. Searching
581
583
           Chapter 3. Searching with the Qt graphical user interface
582
3.1. Searching with the Qt graphical user interface
584
583
585
   The recoll program provides the main user interface for searching. It is
584
   The recoll program provides the main user interface for searching. It is
586
   based on the Qt library.
585
   based on the Qt library.
587
586
588
   recoll has two search modes:
587
   recoll has two search modes:
...
...
606
   white space in this case (they would typically be printed without white
605
   white space in this case (they would typically be printed without white
607
   space).
606
   space).
608
607
609
     ----------------------------------------------------------------------
608
     ----------------------------------------------------------------------
610
609
611
3.1. Simple search
610
  3.1.1. Simple search
612
611
613
    1. Start the recoll program.
612
    1. Start the recoll program.
614
613
615
    2. Possibly choose a search mode: Any term, All terms, File name or Query
614
    2. Possibly choose a search mode: Any term, All terms, File name or Query
616
       language.
615
       language.
...
...
666
665
667
   You can use the Tools / Advanced search dialog for more complex searches.
666
   You can use the Tools / Advanced search dialog for more complex searches.
668
667
669
     ----------------------------------------------------------------------
668
     ----------------------------------------------------------------------
670
669
671
3.2. The result list
670
  3.1.2. The result list
672
671
673
   After starting a search, a list of results will instantly be displayed in
672
   After starting a search, a list of results will instantly be displayed in
674
   the main list window.
673
   the main list window.
675
674
676
   By default, the document list is presented in order of relevance (how well
675
   By default, the document list is presented in order of relevance (how well
...
...
712
   the preferences). Use the arrow buttons in the toolbar or the links at the
711
   the preferences). Use the arrow buttons in the toolbar or the links at the
713
   bottom of the page to browse the results.
712
   bottom of the page to browse the results.
714
713
715
     ----------------------------------------------------------------------
714
     ----------------------------------------------------------------------
716
715
717
  3.2.1. The result list right-click menu
716
    3.1.2.1. The result list right-click menu
718
717
719
   Apart from the preview and edit links, you can display a pop-up menu by
718
   Apart from the preview and edit links, you can display a pop-up menu by
720
   right-clicking over a paragraph in the result list. This menu has the
719
   right-clicking over a paragraph in the result list. This menu has the
721
   following entries:
720
   following entries:
722
721
723
     * Preview
722
     * Preview
724
723
725
     * Edit
724
     * Open
726
725
727
     * Copy File Name
726
     * Copy File Name
728
727
729
     * Copy Url
728
     * Copy Url
730
729
...
...
734
733
735
     * Preview Parent document
734
     * Preview Parent document
736
735
737
     * Open Parent document
736
     * Open Parent document
738
737
739
   The Preview and Edit entries do the same thing as the corresponding links.
738
   The Preview and Open entries do the same thing as the corresponding links.
740
739
741
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
740
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
742
   for later pasting.
741
   for later pasting.
743
742
744
   Save to File allows saving the contents of a result document to a chosen
743
   Save to File allows saving the contents of a result document to a chosen
...
...
762
   this case. In other cases, the Open option makes sense, for exemple to
761
   this case. In other cases, the Open option makes sense, for exemple to
763
   start a chm viewer on the parent document for a help page.
762
   start a chm viewer on the parent document for a help page.
764
763
765
     ----------------------------------------------------------------------
764
     ----------------------------------------------------------------------
766
765
767
3.3. The preview window
766
  3.1.3. The preview window
768
767
769
   The preview window opens when you first click a Preview link inside the
768
   The preview window opens when you first click a Preview link inside the
770
   result list.
769
   result list.
771
770
772
   Subsequent preview requests for a given search open new tabs in the
771
   Subsequent preview requests for a given search open new tabs in the
...
...
806
   You can print the current preview window contents by typing ^P (Ctrl + P)
805
   You can print the current preview window contents by typing ^P (Ctrl + P)
807
   in the window text.
806
   in the window text.
808
807
809
     ----------------------------------------------------------------------
808
     ----------------------------------------------------------------------
810
809
811
3.4. The query language
812
813
   The query language processor is activated on the simple search entry when
814
   the search mode selector is set to Query Language.
815
816
   The language is roughly based on the Xesam user search language
817
   specification.
818
819
   Here follows a sample request that we are going to explain:
820
821
           author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
822
     
823
824
   This would search for all documents with John Doe appearing as a phrase in
825
   the author field (exactly what this is would depend on the document type,
826
   ie: the From: header, for an email message), and containing either beatles
827
   or lennon and either live or unplugged but not potatoes (in any part of
828
   the document).
829
830
   An element is composed of an optional field specification, and a value,
831
   separated by a colon. Exemple: Beatles, author:balzac, dc:title:grandet
832
833
   The colon, if present, means "contains". Xesam defines other relations,
834
   which are not supported for now.
835
836
   All elements in the search entry are normally combined with an implicit
837
   AND. It is possible to specify that elements be OR'ed instead, as in
838
   Beatles OR Lennon. The OR must be entered literally (capitals), and it has
839
   priority over the AND associations: word1 word2 OR word3 means word1 AND
840
   (word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
841
   parenthesis, they are not supported for now.
842
843
   An element preceded by a - specifies a term that should not appear. Pure
844
   negative queries are forbidden.
845
846
   As usual, words inside quotes define a phrase (the order of words is
847
   significant), so that title:"prejudice pride" is not the same as
848
   title:prejudice title:pride, and is unlikely to find a result.
849
850
   Recoll currently manages the following default fields:
851
852
     * title, subject or caption are synonyms which specify data to be
853
       searched for in the document title or subject.
854
855
     * author or from for searching the documents originators.
856
857
     * recipient or to for searching the documents recipients.
858
859
     * keyword for searching the document-specified keywords (few documents
860
       actually have any).
861
862
     * filename for the document's file name.
863
864
     * ext specifies the file name extension (Ex: ext:html)
865
866
   The field syntax also supports a few field-like, but special, criteria:
867
868
     * dir for filtering the results on file location (Ex:
869
       dir:/home/me/somedir). Please note that this is quite inefficient,
870
       that it may produce very slow searches, and that it may be worth in
871
       some cases to set up separate databases instead.
872
873
     * date for searching or filtering on dates. The syntax for the argument
874
       is based on the ISO8601 standard for dates and time intervals. Only
875
       dates are supported, no times. The general syntax is 2 elements
876
       separated by a / character. Each element can be a date or a period of
877
       time. Periods are specified as PnYnMnD. The n numbers are the
878
       respective numbers of years, months or days, any of which may be
879
       missing. Dates are specified as YYYY-MM-DD. The days and months parts
880
       may be missing. If the / is present but an element is missing, the
881
       missing element is interpreted as the lowest or highest date in the
882
       index. Exemples:
883
884
          * 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
885
886
          * 2001-03-01/P1Y2M the same specified with a period.
887
888
          * 2001/ from the beginning of 2001 to the latest date in the index.
889
890
          * 2001 the whole year of 2001
891
892
          * P2D/ means 2 days ago up to now if there are no documents with
893
            dates in the future.
894
895
          * /2003 all documents from 2003 or older.
896
897
       Periods can also be specified with small letters (ie: p2y).
898
899
     * mime or format for specifying the mime type. This one is quite special
900
       because you can specify several values which will be OR'ed (the normal
901
       default for the language is AND). Ex: mime:text/plain mime:text/html.
902
       Specifying an explicit boolean operator or negation (-) before a mime
903
       specification is not supported and will produce strange results.
904
905
     * type or rclcat for specifying the category (as in
906
       text/media/presentation/etc.). The classification of mime types in
907
       categories is defined in the Recoll configuration (mimeconf), and can
908
       be modified or extended. The default category names are those which
909
       permit filtering results in the main GUI screen. Categories are OR'ed
910
       like mime types above.
911
912
   The document filters used while indexing have the possibility to create
913
   other fields with arbitrary names, and aliases may be defined in the
914
   configuration, so that the exact field search possibilities may be
915
   different for you if someone took care of the customisation.
916
917
   The query language is currently the only way to use the Recoll field
918
   search capability.
919
920
   Words inside phrases and capitalized words are not stem-expanded.
921
   Wildcards may be used anywhere inside a term. Specifying a wild-card on
922
   the left of a term can produce a very slow search (or even an incorrect
923
   one if the expansion is truncated because of excessive size).
924
925
   You can use the show query link at the top of the result list to check the
926
   exact query which was finally executed by Xapian.
927
928
   Most Xesam phrase modifiers are unsupported, except for l (small ell) to
929
   disable stemming, and p to turn a phrase into a NEAR (unordered) search.
930
   Exemple: "prejudice pride"p
931
932
     ----------------------------------------------------------------------
933
934
3.5. Complex/advanced search
810
  3.1.4. Complex/advanced search
935
811
936
   The advanced search dialog helps you build more complex queries. It can be
812
   The advanced search dialog helps you build more complex queries without
937
   opened through the Tools menu or through the main toolbar.
813
   memorizing the search language constructs. It can be opened through the
814
   Tools menu or through the main toolbar.
938
815
939
   The dialog has three parts:
816
   The dialog has three parts:
940
817
941
     * The top part allows constructing a query by combining multiple clauses
818
     * The top part allows constructing a query by combining multiple clauses
942
       of different types. Each entry field is configurable for the following
819
       of different types. Each entry field is configurable for the following
...
...
995
   Click on the Show query details link at the top of the result page to see
872
   Click on the Show query details link at the top of the result page to see
996
   the query expansion.
873
   the query expansion.
997
874
998
     ----------------------------------------------------------------------
875
     ----------------------------------------------------------------------
999
876
1000
3.6. The term explorer tool
877
  3.1.5. The term explorer tool
1001
878
1002
   Recoll automatically manages the expansion of search terms to their
879
   Recoll automatically manages the expansion of search terms to their
1003
   derivatives (ie: plural/singular, verb inflections). But there are other
880
   derivatives (ie: plural/singular, verb inflections). But there are other
1004
   cases where the exact search term is not known. For example, you may not
881
   cases where the exact search term is not known. For example, you may not
1005
   remember the exact spelling, or only know the beginning of the name.
882
   remember the exact spelling, or only know the beginning of the name.
...
...
1050
   simple search entry field. You can also cut/paste between the result list
927
   simple search entry field. You can also cut/paste between the result list
1051
   and any entry field (the end of lines will be taken care of).
928
   and any entry field (the end of lines will be taken care of).
1052
929
1053
     ----------------------------------------------------------------------
930
     ----------------------------------------------------------------------
1054
931
1055
3.7. More about wildcards
1056
1057
   All words entered in Recoll search fields will be processed for wildcard
1058
   expansion before the request is finally executed.
1059
1060
   The wildcard characters are:
1061
1062
     * * which matches 0 or more characters.
1063
1064
     * ? which matches a single character.
1065
1066
     * [] which allow defining sets of characters to be matched (ex: [abc]
1067
       matches a single character which may be 'a' or 'b' or 'c', [0-9]
1068
       matches any number.
1069
1070
   You should be aware of a few things before using wildcards.
1071
1072
     * Using a wildcard character at the beginning of a word can make for a
1073
       slow search because Recoll will have to scan the whole index term list
1074
       to find the matches.
1075
1076
     * Using a * at the end of a word can produce more matches than you would
1077
       think, and strange search results. You can use the term explorer tool
1078
       to check what completions exist for a given term. You can also see
1079
       exactly what search was performed by clicking on the link at the top
1080
       of the result list. In general, for natural language terms, stem
1081
       expansion will produce better results than an ending * (stem expansion
1082
       is turned off when any wildcard character appears in the term).
1083
1084
     ----------------------------------------------------------------------
1085
1086
3.8. Multiple databases
932
  3.1.6. Multiple databases
1087
933
1088
   Multiple Recoll databases or indexes can be created by using several
934
   Multiple Recoll databases or indexes can be created by using several
1089
   configuration directories which are usually set to index different areas
935
   configuration directories which are usually set to index different areas
1090
   of the file system. A specific index can be selected for updating or
936
   of the file system. A specific index can be selected for updating or
1091
   searching, using the RECOLL_CONFDIR environment variable or the -c option
937
   searching, using the RECOLL_CONFDIR environment variable or the -c option
...
...
1126
   with the directory filter in advanced search, but multiple indexes will
972
   with the directory filter in advanced search, but multiple indexes will
1127
   have much better performance and may be worth the trouble.
973
   have much better performance and may be worth the trouble.
1128
974
1129
     ----------------------------------------------------------------------
975
     ----------------------------------------------------------------------
1130
976
1131
3.9. Document history
977
  3.1.7. Document history
1132
978
1133
   Documents that you actually view (with the internal preview or an external
979
   Documents that you actually view (with the internal preview or an external
1134
   tool) are entered into the document history, which is remembered.
980
   tool) are entered into the document history, which is remembered.
1135
981
1136
   You can display the history list by using the Tools/Doc History menu
982
   You can display the history list by using the Tools/Doc History menu
...
...
1139
   You can erase the document history by using the Erase document history
985
   You can erase the document history by using the Erase document history
1140
   entry in the File menu.
986
   entry in the File menu.
1141
987
1142
     ----------------------------------------------------------------------
988
     ----------------------------------------------------------------------
1143
989
1144
3.10. Sorting search results and collapsing duplicates
990
  3.1.8. Sorting search results and collapsing duplicates
1145
991
1146
   The documents in a result list are normally sorted in order of relevance.
992
   The documents in a result list are normally sorted in order of relevance.
1147
   It is possible to specify different sort parameters by using the Sort
993
   It is possible to specify different sort parameters by using the Sort
1148
   parameters dialog (located in the Tools menu).
994
   parameters dialog (located in the Tools menu).
1149
995
...
...
1166
   not be a duplicate of the text only). Duplicates hiding is controlled by
1012
   not be a duplicate of the text only). Duplicates hiding is controlled by
1167
   an entry in the Query configuration dialog, and is off by default.
1013
   an entry in the Query configuration dialog, and is off by default.
1168
1014
1169
     ----------------------------------------------------------------------
1015
     ----------------------------------------------------------------------
1170
1016
1171
3.11. Search tips, shortcuts
1017
  3.1.9. Search tips, shortcuts
1172
1018
1173
  3.11.1. Terms and search expansion
1019
    3.1.9.1. Terms and search expansion
1174
1020
1175
   Term completion. Typing Esc Space in the simple search entry field while
1021
   Term completion. Typing Esc Space in the simple search entry field while
1176
   entering a word will either complete the current word if its beginning
1022
   entering a word will either complete the current word if its beginning
1177
   matches a unique term in the index, or open a window to propose a list of
1023
   matches a unique term in the index, or open a window to propose a list of
1178
   completions.
1024
   completions.
...
...
1207
   file name search which will only look for file names, and may be faster
1053
   file name search which will only look for file names, and may be faster
1208
   than the generic search especially when using wildcards.
1054
   than the generic search especially when using wildcards.
1209
1055
1210
     ----------------------------------------------------------------------
1056
     ----------------------------------------------------------------------
1211
1057
1212
  3.11.2. Working with phrases and proximity
1058
    3.1.9.2. Working with phrases and proximity
1213
1059
1214
   Phrases and Proximity searches. A phrase can be looked for by enclosing it
1060
   Phrases and Proximity searches. A phrase can be looked for by enclosing it
1215
   in double quotes. Example: "user manual" will look only for occurrences of
1061
   in double quotes. Example: "user manual" will look only for occurrences of
1216
   user immediately followed by manual. You can use the This phrase field of
1062
   user immediately followed by manual. You can use the This phrase field of
1217
   the advanced search dialog to the same effect. Phrases can be entered
1063
   the advanced search dialog to the same effect. Phrases can be entered
...
...
1226
   documents where either virtual or reality or both appear, but those which
1072
   documents where either virtual or reality or both appear, but those which
1227
   contain virtual reality should appear sooner in the list.
1073
   contain virtual reality should appear sooner in the list.
1228
1074
1229
     ----------------------------------------------------------------------
1075
     ----------------------------------------------------------------------
1230
1076
1231
  3.11.3. Others
1077
    3.1.9.3. Others
1232
1078
1233
   Using fields. You can use the query language and field specifications to
1079
   Using fields. You can use the query language and field specifications to
1234
   only search certain parts of documents. This can be especially helpful
1080
   only search certain parts of documents. This can be especially helpful
1235
   with email, for example only searching emails from a specific originator:
1081
   with email, for example only searching emails from a specific originator:
1236
   search tips from:helpfulgui
1082
   search tips from:helpfulgui
...
...
1261
1107
1262
   Quitting. Entering ^Q almost anywhere will close the application.
1108
   Quitting. Entering ^Q almost anywhere will close the application.
1263
1109
1264
     ----------------------------------------------------------------------
1110
     ----------------------------------------------------------------------
1265
1111
1266
3.12. Customizing the search interface
1112
  3.1.10. Customizing the search interface
1267
1113
1268
   You can customize some aspects of the search interface by using the Query
1114
   You can customize some aspects of the search interface by using the Query
1269
   configuration entry in the Preferences menu.
1115
   configuration entry in the Preferences menu.
1270
1116
1271
   There are several tabs in the dialog, dealing with the interface itself,
1117
   There are several tabs in the dialog, dealing with the interface itself,
...
...
1297
       involves quite a lot of processing, and can be disabled over the given
1143
       involves quite a lot of processing, and can be disabled over the given
1298
       text size to speed up loading.
1144
       text size to speed up loading.
1299
1145
1300
     * Use desktop preferences to choose document editor: if this is checked,
1146
     * Use desktop preferences to choose document editor: if this is checked,
1301
       the xdg-open utility will be used to open files when you click the
1147
       the xdg-open utility will be used to open files when you click the
1302
       Edit link in the result list, instead of the application defined in
1148
       Open link in the result list, instead of the application defined in
1303
       mimeview. xdg-open will in term use your desktop preferences to choose
1149
       mimeview. xdg-open will in term use your desktop preferences to choose
1304
       an appropriate application.
1150
       an appropriate application.
1305
1151
1306
     * Choose editor applications this will let you choose the command
1152
     * Choose editor applications this will let you choose the command
1307
       started by the Edit links inside the result list, for specific
1153
       started by the Open links inside the result list, for specific
1308
       document types.
1154
       document types.
1309
1155
1310
     * Display category filter as toolbar... this will let you choose if the
1156
     * Display category filter as toolbar... this will let you choose if the
1311
       document categories are displayed as a list or a set of buttons.
1157
       document categories are displayed as a list or a set of buttons.
1312
1158
...
...
1378
   alternative indexer may also need to implement a way of purging the index
1224
   alternative indexer may also need to implement a way of purging the index
1379
   from stale data,
1225
   from stale data,
1380
1226
1381
     ----------------------------------------------------------------------
1227
     ----------------------------------------------------------------------
1382
1228
1383
  3.12.1. The result list paragraph format
1229
    3.1.10.1. The result list paragraph format
1384
1230
1385
   The presentation of each result inside the result list can be customized
1231
   The presentation of each result inside the result list can be customized
1386
   by setting the result list paragraph format inside the User Interface tab
1232
   by setting the result list paragraph format inside the User Interface tab
1387
   of the Query configuration.
1233
   of the Query configuration.
1388
1234
...
...
1457
   if the custom formatting results in multiple paragraphs per result, right
1303
   if the custom formatting results in multiple paragraphs per result, right
1458
   clicks will only work inside the first one.
1304
   clicks will only work inside the first one.
1459
1305
1460
     ----------------------------------------------------------------------
1306
     ----------------------------------------------------------------------
1461
1307
1462
                  Chapter 4. Searching with the KDE KIO slave
1308
3.2. Searching with the KDE KIO slave
1463
1309
1464
4.1. What's this
1310
  3.2.1. What's this
1465
1311
1466
   The Recoll KIO slave allows performing a Recoll search by entering an
1312
   The Recoll KIO slave allows performing a Recoll search by entering an
1467
   appropriate URL in a KDE open dialog, or with an HTML-based interface
1313
   appropriate URL in a KDE open dialog, or with an HTML-based interface
1468
   displayed in Konqueror.
1314
   displayed in Konqueror.
1469
1315
...
...
1480
   The interface is described in more detail inside a help file which you can
1326
   The interface is described in more detail inside a help file which you can
1481
   access by entering recoll:/ inside the konqueror URL line (this works only
1327
   access by entering recoll:/ inside the konqueror URL line (this works only
1482
   if the recoll KIO slave has been previously installed).
1328
   if the recoll KIO slave has been previously installed).
1483
1329
1484
   The instructions for building this module are located in the source tree.
1330
   The instructions for building this module are located in the source tree.
1485
   See: kde/kio/recoll/00README.txt
1331
   See: kde/kio/recoll/00README.txt. Some Linux distributions do package the
1332
   kio-recoll module, so check before diving into the build process, maybe
1333
   it's already out there ready for one-click installation.
1486
1334
1487
     ----------------------------------------------------------------------
1335
     ----------------------------------------------------------------------
1488
1336
1489
4.2. Searchable documents
1337
  3.2.2. Searchable documents
1490
1338
1491
   As a sample application, the Recoll KIO slave could allow preparing a set
1339
   As a sample application, the Recoll KIO slave could allow preparing a set
1492
   of HTML documents (for example a manual) so that they become their own
1340
   of HTML documents (for example a manual) so that they become their own
1493
   search interface inside konqueror.
1341
   search interface inside konqueror.
1494
1342
...
...
1507
  ....
1355
  ....
1508
 <body ondblclick="recollsearch()">
1356
 <body ondblclick="recollsearch()">
1509
1357
1510
     ----------------------------------------------------------------------
1358
     ----------------------------------------------------------------------
1511
1359
1512
                    Chapter 5. Searching on the command line
1360
3.3. Searching on the command line
1513
1361
1514
   There are several ways to obtain search results as a text stream, without
1362
   There are several ways to obtain search results as a text stream, without
1515
   a graphical interface:
1363
   a graphical interface:
1516
1364
1517
     * By passing option -t to the recoll program.
1365
     * By passing option -t to the recoll program.
...
...
1523
   The first two methods work in the same way and accept/need the same
1371
   The first two methods work in the same way and accept/need the same
1524
   arguments (except for the additional -t to recoll). The query to be
1372
   arguments (except for the additional -t to recoll). The query to be
1525
   executed is specified as command line arguments.
1373
   executed is specified as command line arguments.
1526
1374
1527
   recollq is not built by default. You can use the Makefile in the query
1375
   recollq is not built by default. You can use the Makefile in the query
1528
   directory to build it. This is a very simple program, and it will often be
1376
   directory to build it. This is a very simple program, and if you can
1529
   useful to taylor its output format to your needs.
1377
   program a little c++, you may find it useful to taylor its output format
1378
   to your needs.
1530
1379
1531
   recollq has a man page (not installed by default, look in the doc/man
1380
   recollq has a man page (not installed by default, look in the doc/man
1532
   directory). The Usage string is as follows:
1381
   directory). The Usage string is as follows:
1533
1382
1534
 recollq [-o|-a|-f] <query string>
1383
 recollq [-o|-a|-f] <query string>
...
...
1557
 text/html       [file:///Users/uncrypted-dockes/projets/pagepers/index.html]    [psxtcl/writemime/recoll]...
1406
 text/html       [file:///Users/uncrypted-dockes/projets/pagepers/index.html]    [psxtcl/writemime/recoll]...
1558
 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
1407
 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
1559
1408
1560
     ----------------------------------------------------------------------
1409
     ----------------------------------------------------------------------
1561
1410
1411
3.4. The query language
1412
1413
   The query language processor is activated in the GUI simple search entry
1414
   when the search mode selector is set to Query Language. It can also be
1415
   used with the KIO slave or the command line search. It broadly has the
1416
   same capabilities as the complex search interface in the GUI.
1417
   Additionally, the query language is for now the only way to access the
1418
   important Recoll field search capabilities.
1419
1420
   The language is roughly based on the Xesam user search language
1421
   specification.
1422
1423
   If the results of a query language search puzzle you and you doubt what
1424
   has been actually searched for, you can use the GUI show query link at the
1425
   top of the result list to check the exact query which was finally executed
1426
   by Xapian.
1427
1428
   Here follows a sample request that we are going to explain:
1429
1430
           author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
1431
     
1432
1433
   This would search for all documents with John Doe appearing as a phrase in
1434
   the author field (exactly what this is would depend on the document type,
1435
   ie: the From: header, for an email message), and containing either beatles
1436
   or lennon and either live or unplugged but not potatoes (in any part of
1437
   the document).
1438
1439
   An element is composed of an optional field specification, and a value,
1440
   separated by a colon. Exemple: Beatles, author:balzac, dc:title:grandet
1441
1442
   The colon, if present, means "contains". Xesam defines other relations,
1443
   which are not supported for now.
1444
1445
   All elements in the search entry are normally combined with an implicit
1446
   AND. It is possible to specify that elements be OR'ed instead, as in
1447
   Beatles OR Lennon. The OR must be entered literally (capitals), and it has
1448
   priority over the AND associations: word1 word2 OR word3 means word1 AND
1449
   (word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
1450
   parenthesis, they are not supported for now.
1451
1452
   An element preceded by a - specifies a term that should not appear. Pure
1453
   negative queries are forbidden.
1454
1455
   As usual, words inside quotes define a phrase (the order of words is
1456
   significant), so that title:"prejudice pride" is not the same as
1457
   title:prejudice title:pride, and is unlikely to find a result.
1458
1459
   Most Xesam phrase modifiers are unsupported, except for l (small ell) to
1460
   disable stemming, and p to turn a phrase into a NEAR (unordered proximity)
1461
   search. Exemple: "prejudice pride"p
1462
1463
   Recoll currently manages the following default fields:
1464
1465
     * title, subject or caption are synonyms which specify data to be
1466
       searched for in the document title or subject.
1467
1468
     * author or from for searching the documents originators.
1469
1470
     * recipient or to for searching the documents recipients.
1471
1472
     * keyword for searching the document-specified keywords (few documents
1473
       actually have any).
1474
1475
     * filename for the document's file name.
1476
1477
     * ext specifies the file name extension (Ex: ext:html)
1478
1479
   The field syntax also supports a few field-like, but special, criteria:
1480
1481
     * dir for filtering the results on file location (Ex:
1482
       dir:/home/me/somedir). Please note that this is quite inefficient,
1483
       that it may produce very slow searches, and that it may be worth in
1484
       some cases to set up separate databases instead.
1485
1486
     * date for searching or filtering on dates. The syntax for the argument
1487
       is based on the ISO8601 standard for dates and time intervals. Only
1488
       dates are supported, no times. The general syntax is 2 elements
1489
       separated by a / character. Each element can be a date or a period of
1490
       time. Periods are specified as PnYnMnD. The n numbers are the
1491
       respective numbers of years, months or days, any of which may be
1492
       missing. Dates are specified as YYYY-MM-DD. The days and months parts
1493
       may be missing. If the / is present but an element is missing, the
1494
       missing element is interpreted as the lowest or highest date in the
1495
       index. Exemples:
1496
1497
          * 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
1498
1499
          * 2001-03-01/P1Y2M the same specified with a period.
1500
1501
          * 2001/ from the beginning of 2001 to the latest date in the index.
1502
1503
          * 2001 the whole year of 2001
1504
1505
          * P2D/ means 2 days ago up to now if there are no documents with
1506
            dates in the future.
1507
1508
          * /2003 all documents from 2003 or older.
1509
1510
       Periods can also be specified with small letters (ie: p2y).
1511
1512
     * mime or format for specifying the mime type. This one is quite special
1513
       because you can specify several values which will be OR'ed (the normal
1514
       default for the language is AND). Ex: mime:text/plain mime:text/html.
1515
       Specifying an explicit boolean operator or negation (-) before a mime
1516
       specification is not supported and will produce strange results. Note
1517
       that mime is the ONLY field with an OR default. You do need to use OR
1518
       with ext terms for example.
1519
1520
     * type or rclcat for specifying the category (as in
1521
       text/media/presentation/etc.). The classification of mime types in
1522
       categories is defined in the Recoll configuration (mimeconf), and can
1523
       be modified or extended. The default category names are those which
1524
       permit filtering results in the main GUI screen. Categories are OR'ed
1525
       like mime types above.
1526
1527
   Words inside phrases and capitalized words are not stem-expanded.
1528
   Wildcards may be used anywhere inside a term. Specifying a wild-card on
1529
   the left of a term can produce a very slow search (or even an incorrect
1530
   one if the expansion is truncated because of excessive size). Also see
1531
   More about wildcards.
1532
1533
   The document filters used while indexing have the possibility to create
1534
   other fields with arbitrary names, and aliases may be defined in the
1535
   configuration, so that the exact field search possibilities may be
1536
   different for you if someone took care of the customisation.
1537
1538
     ----------------------------------------------------------------------
1539
1540
  3.4.1. More about wildcards
1541
1542
   All words entered in Recoll search fields will be processed for wildcard
1543
   expansion before the request is finally executed.
1544
1545
   The wildcard characters are:
1546
1547
     * * which matches 0 or more characters.
1548
1549
     * ? which matches a single character.
1550
1551
     * [] which allow defining sets of characters to be matched (ex: [abc]
1552
       matches a single character which may be 'a' or 'b' or 'c', [0-9]
1553
       matches any number.
1554
1555
   You should be aware of a few things before using wildcards.
1556
1557
     * Using a wildcard character at the beginning of a word can make for a
1558
       slow search because Recoll will have to scan the whole index term list
1559
       to find the matches.
1560
1561
     * Using a * at the end of a word can produce more matches than you would
1562
       think, and strange search results. You can use the term explorer tool
1563
       to check what completions exist for a given term. You can also see
1564
       exactly what search was performed by clicking on the link at the top
1565
       of the result list. In general, for natural language terms, stem
1566
       expansion will produce better results than an ending * (stem expansion
1567
       is turned off when any wildcard character appears in the term).
1568
1569
     ----------------------------------------------------------------------
1570
1571
3.5. Desktop integration
1572
1573
   Being independant of the desktop type has its drawbacks: Recoll desktop
1574
   integration is minimal. Here follow a few things that may help.
1575
1576
     ----------------------------------------------------------------------
1577
1578
  3.5.1. Hotkeying recoll
1579
1580
   It is surprisingly convenient to be able to show or hide the Recoll GUI
1581
   with a single keystroke. Recoll comes with a small python script, based on
1582
   the libwnck window manager interface library, which will allow you to do
1583
   just this. The detailed instructions are on this wiki page.
1584
1585
     ----------------------------------------------------------------------
1586
1587
  3.5.2. The KDE Kicker Recoll applet
1588
1589
   The Recoll source tree contains the source code to the recoll_applet, a
1590
   small application derived from the find_applet. This can be used to add a
1591
   small Recoll launcher to the KDE panel.
1592
1593
   The applet is not automatically built with the main Recoll programs, nor
1594
   is it included with the main source distribution (because the KDE build
1595
   boilerplate makes it relatively big). You can download its source from the
1596
   recoll.org download page. Use the omnipotent configure;make;make install
1597
   incantation to build and install.
1598
1599
   You can then add the applet to the panel by right-clicking the panel and
1600
   choosing the Add applet entry.
1601
1602
   The recoll_applet has a small text window where you can type a Recoll
1603
   query (in query language form), and an icon which can be used to restrict
1604
   the search to certain types of files. It is quite primitive, and launches
1605
   a new recoll GUI instance every time (even if it is already running). You
1606
   may find it useful anyway.
1607
1608
     ----------------------------------------------------------------------
1609
1562
                        Chapter 6. Programming interface
1610
                        Chapter 4. Programming interface
1563
1611
1564
   Recoll has an Application programming Interface, usable both for indexing
1612
   Recoll has an Application programming Interface, usable both for indexing
1565
   and searching, currently accessible from the Python language.
1613
   and searching, currently accessible from the Python language.
1566
1614
1567
   Another less radical way to extend the application is to write filters for
1615
   Another less radical way to extend the application is to write filters for
...
...
1570
   The processing of metadata attributes for documents (fields) is highly
1618
   The processing of metadata attributes for documents (fields) is highly
1571
   configurable.
1619
   configurable.
1572
1620
1573
     ----------------------------------------------------------------------
1621
     ----------------------------------------------------------------------
1574
1622
1575
6.1. Writing a document filter
1623
4.1. Writing a document filter
1576
1624
1577
   Recoll filters are executable programs which translate from a specific
1625
   Recoll filters are executable programs which translate from a specific
1578
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
1626
   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
1579
   format, which may be text/plain or text/html.
1627
   format, which may be text/plain or text/html.
1580
1628
...
...
1648
   cannot specify the character set and other metadata, so they are limited
1696
   cannot specify the character set and other metadata, so they are limited
1649
   to cases where these elements are not needed.
1697
   to cases where these elements are not needed.
1650
1698
1651
     ----------------------------------------------------------------------
1699
     ----------------------------------------------------------------------
1652
1700
1653
  6.1.1. Filter HTML output
1701
  4.1.1. Filter HTML output
1654
1702
1655
   The output HTML could be very minimal like the following example:
1703
   The output HTML could be very minimal like the following example:
1656
1704
1657
 <html><head>
1705
 <html><head>
1658
 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
1706
 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
...
...
1681
   See the following section for details about configuring how field data is
1729
   See the following section for details about configuring how field data is
1682
   processed by the indexer.
1730
   processed by the indexer.
1683
1731
1684
     ----------------------------------------------------------------------
1732
     ----------------------------------------------------------------------
1685
1733
1686
6.2. Field data processing
1734
4.2. Field data processing
1687
1735
1688
   Fields are named pieces of information in or about documents, like title,
1736
   Fields are named pieces of information in or about documents, like title,
1689
   author, abstract.
1737
   author, abstract.
1690
1738
1691
   The field values for documents can appear in several ways during indexing:
1739
   The field values for documents can appear in several ways during indexing:
...
...
1714
   You can find more information in the section about the fields file, or in
1762
   You can find more information in the section about the fields file, or in
1715
   comments inside the file.
1763
   comments inside the file.
1716
1764
1717
     ----------------------------------------------------------------------
1765
     ----------------------------------------------------------------------
1718
1766
1719
6.3. API
1767
4.3. API
1720
1768
1721
  6.3.1. Interface elements
1769
  4.3.1. Interface elements
1722
1770
1723
   A few elements in the interface are specific and and need an explanation.
1771
   A few elements in the interface are specific and and need an explanation.
1724
1772
1725
   udi
1773
   udi
1726
1774
...
...
1757
   during indexing. The main indexer documents would also probably be a
1805
   during indexing. The main indexer documents would also probably be a
1758
   problem for the external indexer purge operation.
1806
   problem for the external indexer purge operation.
1759
1807
1760
     ----------------------------------------------------------------------
1808
     ----------------------------------------------------------------------
1761
1809
1762
  6.3.2. Python interface
1810
  4.3.2. Python interface
1763
1811
1764
    6.3.2.1. Introduction
1812
    4.3.2.1. Introduction
1765
1813
1766
   Recoll versions after 1.11 define a Python programming interface, both for
1814
   Recoll versions after 1.11 define a Python programming interface, both for
1767
   searching and indexing.
1815
   searching and indexing.
1768
1816
1769
   The Python interface is not built by default and can be found in the
1817
   The Python interface is not built by default and can be found in the
...
...
1787
   python setup.py build
1835
   python setup.py build
1788
   python setup.py install
1836
   python setup.py install
1789
1837
1790
     ----------------------------------------------------------------------
1838
     ----------------------------------------------------------------------
1791
1839
1792
    6.3.2.2. Interface manual
1840
    4.3.2.2. Interface manual
1793
1841
1794
   NAME
1842
   NAME
1795
       recoll - This is an interface to the Recoll full text indexer.
1843
       recoll - This is an interface to the Recoll full text indexer.
1796
1844
1797
   FILE
1845
   FILE
...
...
1977
2025
1978
   
2026
   
1979
2027
1980
     ----------------------------------------------------------------------
2028
     ----------------------------------------------------------------------
1981
2029
1982
    6.3.2.3. Example code
2030
    4.3.2.3. Example code
1983
2031
1984
   The following sample would query the index with a user language string.
2032
   The following sample would query the index with a user language string.
1985
   See the python/samples directory inside the Recoll source for other
2033
   See the python/samples directory inside the Recoll source for other
1986
   examples.
2034
   examples.
1987
2035
...
...
2008
2056
2009
 
2057
 
2010
2058
2011
     ----------------------------------------------------------------------
2059
     ----------------------------------------------------------------------
2012
2060
2013
                            Chapter 7. Installation
2061
                   Chapter 5. Installation and configuration
2014
2062
2015
7.1. Installing a binary copy
2063
5.1. Installing a binary copy
2016
2064
2017
   There are three types of binary Recoll installations:
2065
   There are three types of binary Recoll installations:
2018
2066
2019
     * Through your system normal software distribution framework (ie,
2067
     * Through your system normal software distribution framework (ie,
2020
       Debian/Ubuntu apt, FreeBSD ports, etc.).
2068
       Debian/Ubuntu apt, FreeBSD ports, etc.).
...
...
2034
   may not be necessary for a quick test with default parameters). Most
2082
   may not be necessary for a quick test with default parameters). Most
2035
   parameters can be more conveniently set from the GUI interface.
2083
   parameters can be more conveniently set from the GUI interface.
2036
2084
2037
     ----------------------------------------------------------------------
2085
     ----------------------------------------------------------------------
2038
2086
2039
  7.1.1. Installing through a package system
2087
  5.1.1. Installing through a package system
2040
2088
2041
   If you use a BSD-type port system or a prebuilt package (DEB, RPM,
2089
   If you use a BSD-type port system or a prebuilt package (DEB, RPM,
2042
   manually or through the system software configuration utility), just
2090
   manually or through the system software configuration utility), just
2043
   follow the usual procedure for your system.
2091
   follow the usual procedure for your system.
2044
2092
2045
     ----------------------------------------------------------------------
2093
     ----------------------------------------------------------------------
2046
2094
2047
  7.1.2. Installing a prebuilt Recoll
2095
  5.1.2. Installing a prebuilt Recoll
2048
2096
2049
   The unpackaged binary versions on the Recoll web site are just compressed
2097
   The unpackaged binary versions on the Recoll web site are just compressed
2050
   tar files of a build tree, where only the useful parts were kept
2098
   tar files of a build tree, where only the useful parts were kept
2051
   (executables and sample configuration).
2099
   (executables and sample configuration).
2052
2100
...
...
2057
   had built the package from source (that is, just type make install). The
2105
   had built the package from source (that is, just type make install). The
2058
   binary trees are built for installation to /usr/local.
2106
   binary trees are built for installation to /usr/local.
2059
2107
2060
     ----------------------------------------------------------------------
2108
     ----------------------------------------------------------------------
2061
2109
2062
7.2. Supporting packages
2110
5.2. Supporting packages
2063
2111
2064
   Recoll uses external applications to index some file types. You need to
2112
   Recoll uses external applications to index some file types. You need to
2065
   install them for the file types that you wish to have indexed (these are
2113
   install them for the file types that you wish to have indexed (these are
2066
   run-time optional dependencies. None is needed for building or running
2114
   run-time optional dependencies. None is needed for building or running
2067
   Recoll except for indexing their specific file type).
2115
   Recoll except for indexing their specific file type).
...
...
2072
2120
2073
   A list of common file types which need external commands follows. Many of
2121
   A list of common file types which need external commands follows. Many of
2074
   the filters need the iconv command, which is not always listed as a
2122
   the filters need the iconv command, which is not always listed as a
2075
   dependancy.
2123
   dependancy.
2076
2124
2125
   Please note that, due to the relatively dynamic nature of this
2126
   information, the most up to date version is now kept on the Recoll helper
2127
   applications page along with links to the home pages or best
2128
   source/patches download links. The list below is not updated often and may
2129
   be quite stale.
2130
2131
   For many Linux distributions, most of the commands listed can be installed
2132
   from the package repositories. However, the packages are sometimes
2133
   outdated, or not the best version for Recoll, so you should take a look at
2134
   the Recoll helper applications page if a file type is important to you.
2135
2077
   As of Recoll release 1.14, a number of XML-based formats that were handled
2136
   As of Recoll release 1.14, a number of XML-based formats that were handled
2078
   by ad hoc filter code now use xsltproc, which usually comes with libxslt.
2137
   by ad hoc filter code now use the xsltproc command, which usually comes
2079
   These are: abiword, fb2 (ebooks), kword, openoffice, svg.
2138
   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
2080
2139
2081
     * Openoffice: supported natively, but needs the unzip command to be
2140
   Now for the list:
2082
       installed.
2083
2141
2142
     * Openoffice files need unzip and xsltproc.
2143
2084
     * PDF: pdftotext is part of the Xpdf or Poppler packages.
2144
     * PDF files need pdftotext which is part of the Xpdf or Poppler
2145
       packages.
2085
2146
2086
     * Postscript: pstotext.
2147
     * Postscript files need pstotext. The original version has an issue with
2148
       shell character in file names, which is corrected in recent packages.
2149
       See the the Recoll helper applications page for more detail.
2087
2150
2088
     * MS Word: antiword.
2151
     * MS Word needs antiword. It is also useful to have wvWare installed as
2152
       it may be be used as a fallback for some files which antiword does not
2153
       handle.
2089
2154
2090
     * MS Excel and PowerPoint: catdoc.
2155
     * MS Excel and PowerPoint need catdoc.
2091
2156
2092
     * MS Open XML (docx): needs xsltproc.
2157
     * MS Open XML (docx) needs xsltproc.
2093
2158
2094
     * Wordperfect files: libwpd.
2159
     * Wordperfect files need wpd2html from the libwpd package.
2095
2160
2096
     * RTF: unrtf
2161
     * RTF files need unrtf, which, in its standard version, has much trouble
2162
       with non-western character sets. Check the Recoll helper applications
2163
       page.
2097
2164
2098
     * TeX: Recoll uses the untex program. Your distribution may have a
2165
     * TeX files need untex or detex. Check the Recoll helper applications
2099
       package for it. If it doesn't, there is a copy of the source on the
2166
       page for sources if it's not packaged for your distribution.
2100
       Recoll web site, because the program has no obvious home. The filter
2101
       can also work with detex and will use it if it is installed.
2102
2167
2103
     * dvi: dvips
2168
     * dvi files need dvips.
2104
2169
2105
     * djvu: DjVuLibre
2170
     * djvu files need djvutxt and djvused from the DjVuLibre package.
2106
2171
2107
     * mp3, flac, ogg vorbis: Recoll releases before 1.13 use the id3info
2172
     * Audio files: Recoll releases before 1.13 used the id3info command from
2108
       command from the id3lib package to extract mp3 tag information. (Some
2173
       the id3lib package to extract mp3 tag information, metaflac (standard
2109
       gcc versions after 4.4 may have trouble compiling id3lib. You can find
2174
       flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
2110
       a workaround here), metaflac (standard flac tools) for flac files, and
2175
       Releases 1.14 and later use a single Python filter based on mutagen
2111
       ogginfo (vorbis tools) for ogg files. Releases 1.14 and later use a
2176
       for all audio file types.
2112
       single Python filter based on mutagen for all audio file types.
2113
2177
2114
     * Pictures: Recoll uses the Exiftool Perl package to extract tag
2178
     * Pictures: Recoll uses the Exiftool Perl package to extract tag
2115
       information. Most image file formats are supported. Note that there
2179
       information. Most image file formats are supported. Note that there
2116
       may not be much interest in indexing the technical tags (image size,
2180
       may not be much interest in indexing the technical tags (image size,
2117
       aperture, etc.). This is only of interest if you store personal tags
2181
       aperture, etc.). This is only of interest if you store personal tags
2118
       or textual descriptions inside the image files.
2182
       or textual descriptions inside the image files.
2119
2183
2120
     * chm: files in microsoft help format need Python and the pychm module
2184
     * chm: files in microsoft help format need Python and the pychm module
2121
       (which needs chmlib).
2185
       (which needs chmlib).
2122
2186
2123
     * ics: up to Recoll 1.13, iCalendar files need Python and the icalendar
2187
     * ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
2124
       module. For newer versions, icalendar is not needed
2188
       module. icalendar is not needed for newer versions, which use internal
2189
       code.
2125
2190
2126
     * zip: Zip archives need Python (and the standard zipfile module).
2191
     * Zip archives need Python (and the standard zipfile module).
2127
2192
2128
   Text, HTML, mail folders, Openoffice and Scribus files are processed
2193
   Text, HTML, mail folders, and Scribus files are processed internally. Lyx
2129
   internally. Lyx is used to index Lyx files. Many filters need iconv and
2194
   is used to index Lyx files. Many filters need iconv and the standard sed
2130
   the standard sed and awk.
2195
   and awk.
2131
2196
2132
     ----------------------------------------------------------------------
2197
     ----------------------------------------------------------------------
2133
2198
2134
7.3. Building from source
2199
5.3. Building from source
2135
2200
2136
  7.3.1. Prerequisites
2201
  5.3.1. Prerequisites
2137
2202
2138
   C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
2203
   C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
2139
   itself by strange messages about a missing iconv_open.
2204
   itself by strange messages about a missing iconv_open.
2140
2205
2141
   Development files for Xapian core
2206
   Development files for Xapian core.
2207
2208
     Important: If you are building Xapian for an older CPU (before Pentium 4
2209
     or Athlon 64), you need to add the --disable-sse flag to the configure
2210
     command. Else all Xapian application will crash with an illegal
2211
     instruction error.
2142
2212
2143
   Development files for Qt .
2213
   Development files for Qt .
2144
2214
2145
   Development files for X11 and zlib.
2215
   Development files for X11 and zlib.
2146
2216
...
...
2154
   not be critical). On Linux systems, the iconv interface is part of libc
2224
   not be critical). On Linux systems, the iconv interface is part of libc
2155
   and you should not need to do anything special.
2225
   and you should not need to do anything special.
2156
2226
2157
     ----------------------------------------------------------------------
2227
     ----------------------------------------------------------------------
2158
2228
2159
  7.3.2. Building
2229
  5.3.2. Building
2160
2230
2161
   Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
2231
   Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
2162
   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
2232
   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
2163
   ok). If you build on another system, and need to modify things, I would
2233
   ok). If you build on another system, and need to modify things, I would
2164
   very much welcome patches.
2234
   very much welcome patches.
...
...
2223
   to manually copy and modify one of the existing files (the new file name
2293
   to manually copy and modify one of the existing files (the new file name
2224
   should be the output of uname -s).
2294
   should be the output of uname -s).
2225
2295
2226
     ----------------------------------------------------------------------
2296
     ----------------------------------------------------------------------
2227
2297
2228
  7.3.3. Installation
2298
  5.3.3. Installation
2229
2299
2230
   Either type make install or execute recollinstall prefix, in the root of
2300
   Either type make install or execute recollinstall prefix, in the root of
2231
   the source tree. This will copy the commands to prefix/bin and the sample
2301
   the source tree. This will copy the commands to prefix/bin and the sample
2232
   configuration files, scripts and other shared data to prefix/share/recoll.
2302
   configuration files, scripts and other shared data to prefix/share/recoll.
2233
2303
...
...
2240
2310
2241
   You can then proceed to configuration.
2311
   You can then proceed to configuration.
2242
2312
2243
     ----------------------------------------------------------------------
2313
     ----------------------------------------------------------------------
2244
2314
2245
7.4. Configuration overview
2315
5.4. Configuration overview
2246
2316
2247
   Most of the parameters specific to the recoll GUI are set through the
2317
   Most of the parameters specific to the recoll GUI are set through the
2248
   Preferences menu and stored in the standard Qt place ($HOME/.qt/recollrc).
2318
   Preferences menu and stored in the standard Qt place ($HOME/.qt/recollrc).
2249
   You probably do not want to edit this by hand.
2319
   You probably do not want to edit this by hand.
2250
2320
...
...
2314
   White space is used for separation inside lists. List elements with
2384
   White space is used for separation inside lists. List elements with
2315
   embedded spaces can be quoted using double-quotes.
2385
   embedded spaces can be quoted using double-quotes.
2316
2386
2317
     ----------------------------------------------------------------------
2387
     ----------------------------------------------------------------------
2318
2388
2319
  7.4.1. Main configuration file
2389
  5.4.1. Main configuration file
2320
2390
2321
   recoll.conf is the main configuration file. It defines things like what to
2391
   recoll.conf is the main configuration file. It defines things like what to
2322
   index (top directories and things to ignore), and the default character
2392
   index (top directories and things to ignore), and the default character
2323
   set to use for document types which do not specify it internally.
2393
   set to use for document types which do not specify it internally.
2324
2394
...
...
2331
   Configuration menu in the recoll interface. Some can only be set by
2401
   Configuration menu in the recoll interface. Some can only be set by
2332
   editing the configuration file.
2402
   editing the configuration file.
2333
2403
2334
     ----------------------------------------------------------------------
2404
     ----------------------------------------------------------------------
2335
2405
2336
    7.4.1.1. Parameters affecting what documents we index:
2406
    5.4.1.1. Parameters affecting what documents we index:
2337
2407
2338
   topdirs
2408
   topdirs
2339
2409
2340
           Specifies the list of directories or files to index (recursively
2410
           Specifies the list of directories or files to index (recursively
2341
           for directories). You can use symbolic links as elements of this
2411
           for directories). You can use symbolic links as elements of this
...
...
2454
           Beagle plugin as ~/.beagle/ToIndex so there should be no need to
2524
           Beagle plugin as ~/.beagle/ToIndex so there should be no need to
2455
           change it.
2525
           change it.
2456
2526
2457
     ----------------------------------------------------------------------
2527
     ----------------------------------------------------------------------
2458
2528
2459
    7.4.1.2. Parameters affecting how we generate terms:
2529
    5.4.1.2. Parameters affecting how we generate terms:
2460
2530
2461
   Changing some of these parameters will imply a full reindex. Also, when
2531
   Changing some of these parameters will imply a full reindex. Also, when
2462
   using multiple indexes, it may not make sense to search indexes that don't
2532
   using multiple indexes, it may not make sense to search indexes that don't
2463
   share the values for these parameters, because they usually affect both
2533
   share the values for these parameters, because they usually affect both
2464
   search and index operations.
2534
   search and index operations.
...
...
2521
           localfields= rclaptg=gnus:other = val, then select specifier
2591
           localfields= rclaptg=gnus:other = val, then select specifier
2522
           viewer with mimetype|tag=... in mimeview.
2592
           viewer with mimetype|tag=... in mimeview.
2523
2593
2524
     ----------------------------------------------------------------------
2594
     ----------------------------------------------------------------------
2525
2595
2526
    7.4.1.3. Parameters affecting where and how we store things:
2596
    5.4.1.3. Parameters affecting where and how we store things:
2527
2597
2528
   dbdir
2598
   dbdir
2529
2599
2530
           The name of the Xapian data directory. It will be created if
2600
           The name of the Xapian data directory. It will be created if
2531
           needed when the index is initialized. If this is not an absolute
2601
           needed when the index is initialized. If this is not an absolute
...
...
2571
           default, which is flushing every 10000 documents (memory usage
2641
           default, which is flushing every 10000 documents (memory usage
2572
           depends on average document size). The default value is 10.
2642
           depends on average document size). The default value is 10.
2573
2643
2574
     ----------------------------------------------------------------------
2644
     ----------------------------------------------------------------------
2575
2645
2576
    7.4.1.4. Miscellaneous parameters:
2646
    5.4.1.4. Miscellaneous parameters:
2577
2647
2578
   loglevel,daemloglevel
2648
   loglevel,daemloglevel
2579
2649
2580
           Verbosity level for recoll and recollindex. A value of 4 lists
2650
           Verbosity level for recoll and recollindex. A value of 4 lists
2581
           quite a lot of debug/information messages. 2 only lists errors.
2651
           quite a lot of debug/information messages. 2 only lists errors.
...
...
2637
           internal value is available (ie: for plain text files). This does
2707
           internal value is available (ie: for plain text files). This does
2638
           not work well in general, and should probably not be used.
2708
           not work well in general, and should probably not be used.
2639
2709
2640
     ----------------------------------------------------------------------
2710
     ----------------------------------------------------------------------
2641
2711
2642
  7.4.2. The fields file
2712
  5.4.2. The fields file
2643
2713
2644
   This file contains information about dynamic fields handling in Recoll.
2714
   This file contains information about dynamic fields handling in Recoll.
2645
   Some very basic fields have hard-wired behaviour, and, mostly, you should
2715
   Some very basic fields have hard-wired behaviour, and, mostly, you should
2646
   not change the original data inside the fields file. But you can create
2716
   not change the original data inside the fields file. But you can create
2647
   custom fields fitting your data and handle them just like they were native
2717
   custom fields fitting your data and handle them just like they were native
...
...
2699
 # mailmytag field name
2769
 # mailmytag field name
2700
 x-my-tag = mailmytag
2770
 x-my-tag = mailmytag
2701
2771
2702
     ----------------------------------------------------------------------
2772
     ----------------------------------------------------------------------
2703
2773
2704
  7.4.3. The mimemap file
2774
  5.4.3. The mimemap file
2705
2775
2706
   mimemap specifies the file name extension to mime type mappings.
2776
   mimemap specifies the file name extension to mime type mappings.
2707
2777
2708
   For file names without an extension, or with an unknown one, the system's
2778
   For file names without an extension, or with an unknown one, the system's
2709
   file -i command will be executed to determine the mime type (this can be
2779
   file -i command will be executed to determine the mime type (this can be
...
...
2725
   given Recoll version. Having it there avoids cluttering the more
2795
   given Recoll version. Having it there avoids cluttering the more
2726
   user-oriented and locally customized skippedNames.
2796
   user-oriented and locally customized skippedNames.
2727
2797
2728
     ----------------------------------------------------------------------
2798
     ----------------------------------------------------------------------
2729
2799
2730
  7.4.4. The mimeconf file
2800
  5.4.4. The mimeconf file
2731
2801
2732
   mimeconf specifies how the different mime types are handled for indexing,
2802
   mimeconf specifies how the different mime types are handled for indexing,
2733
   and which icons are displayed in the recoll result lists.
2803
   and which icons are displayed in the recoll result lists.
2734
2804
2735
   Changing the parameters in the [index] section is probably not a good idea
2805
   Changing the parameters in the [index] section is probably not a good idea
...
...
2739
   recoll in the result lists (the values are the basenames of the png images
2809
   recoll in the result lists (the values are the basenames of the png images
2740
   inside the iconsdir directory (specified in recoll.conf).
2810
   inside the iconsdir directory (specified in recoll.conf).
2741
2811
2742
     ----------------------------------------------------------------------
2812
     ----------------------------------------------------------------------
2743
2813
2744
  7.4.5. The mimeview file
2814
  5.4.5. The mimeview file
2745
2815
2746
   mimeview specifies which programs are started when you click on an Edit
2816
   mimeview specifies which programs are started when you click on an Open
2747
   link in a result list. Ie: HTML is normally displayed using firefox, but
2817
   link in a result list. Ie: HTML is normally displayed using firefox, but
2748
   you may prefer Konqueror, your openoffice.org program might be named
2818
   you may prefer Konqueror, your openoffice.org program might be named
2749
   oofice instead of openoffice etc.
2819
   oofice instead of openoffice etc.
2750
2820
2751
   Changes to this file can be done by direct editing, or through the recoll
2821
   Changes to this file can be done by direct editing, or through the recoll
2752
   user preferences dialog.
2822
   user preferences dialog.
2823
2824
   If Use desktop preferences to choose document editor is checked in the
2825
   Recoll GUI user preferences, all mimeview entries will be ignored except
2826
   the one labelled application/x-all (which is set to use xdg-open by
2827
   default).
2753
2828
2754
   As for the other configuration files, the normal usage is to have a
2829
   As for the other configuration files, the normal usage is to have a
2755
   mimeview inside your own configuration directory, with just the
2830
   mimeview inside your own configuration directory, with just the
2756
   non-default entries, which will override those from the central
2831
   non-default entries, which will override those from the central
2757
   configuration file.
2832
   configuration file.
...
...
2761
   The keys in the file are normally mime types. You can add an application
2836
   The keys in the file are normally mime types. You can add an application
2762
   tag to specialize the choice for an area of the filesystem (using a
2837
   tag to specialize the choice for an area of the filesystem (using a
2763
   localfields specification in mimeconf). The syntax for the key is
2838
   localfields specification in mimeconf). The syntax for the key is
2764
   mimetype|tag
2839
   mimetype|tag
2765
2840
2766
   If Use desktop preferences to choose document editor is checked in the
2767
   user preferences, all mimeview entries will be ignored except the one
2768
   labelled application/x-all (which is set to use xdg-open by default).
2769
2770
   The nouncompforviewmts entry, (placed at the top level, outside of the
2841
   The nouncompforviewmts entry, (placed at the top level, outside of the
2771
   [view] section), holds a list of mime types that should not be
2842
   [view] section), holds a list of mime types that should not be
2772
   uncompressed before starting the viewer (if they are found compressed, ie:
2843
   uncompressed before starting the viewer (if they are found compressed, ie:
2773
   mydoc.doc.gz).
2844
   mydoc.doc.gz).
2774
2845
2775
     ----------------------------------------------------------------------
2846
   The right side of each assignment holds a command to be executed for
2847
   opening the file. The following substitutions are performed:
2776
2848
2849
     * %D. Document date
2850
2851
     * %f. File name. This may be the name of a temporary file if it was
2852
       necessary to create one (ie: to extract a subdocument from a
2853
       container).
2854
2855
     * %F. Original file name. Same as %f except if a temporary file is used.
2856
2857
     * %i. Internal path, for subdocuments of containers. The format depends
2858
       on the container type. If this appears in the command line, Recoll
2859
       will not create a temporary file to extract the subdocument, expecting
2860
       the called application (possibly a script) to be able to handle it.
2861
2862
     * %M. Mime type
2863
2864
     * %U, %u. Url.
2865
2866
   In addition to the predefined values above, all strings like %(fieldname)
2867
   will be replaced by the value of the field named fieldname for the
2868
   document. This could be used in combination with field customisation to
2869
   help with opening the document.
2870
2871
     ----------------------------------------------------------------------
2872
2777
  7.4.6. Examples of configuration adjustments
2873
  5.4.6. Examples of configuration adjustments
2778
2874
2779
    7.4.6.1. Adding an external viewer for an non-indexed type
2875
    5.4.6.1. Adding an external viewer for an non-indexed type
2780
2876
2781
   Imagine that you have some kind of file which does not have indexable
2877
   Imagine that you have some kind of file which does not have indexable
2782
   content, but for which you would like to have a functional Edit link in
2878
   content, but for which you would like to have a functional Open link in
2783
   the result list (when found by file name). The file names end in .blob and
2879
   the result list (when found by file name). The file names end in .blob and
2784
   can be displayed by application blobviewer.
2880
   can be displayed by application blobviewer.
2785
2881
2786
   You need two entries in the configuration files for this to work:
2882
   You need two entries in the configuration files for this to work:
2787
2883
...
...
2806
   configuration, which you do not need to alter. mimeview can also be
2902
   configuration, which you do not need to alter. mimeview can also be
2807
   modified from the Gui.
2903
   modified from the Gui.
2808
2904
2809
     ----------------------------------------------------------------------
2905
     ----------------------------------------------------------------------
2810
2906
2811
    7.4.6.2. Adding indexing support for a new file type
2907
    5.4.6.2. Adding indexing support for a new file type
2812
2908
2813
   Let us now imagine that the above .blob files actually contain indexable
2909
   Let us now imagine that the above .blob files actually contain indexable
2814
   text and that you know how to extract it with a command line program.
2910
   text and that you know how to extract it with a command line program.
2815
   Getting Recoll to index the files is easy. You need to perform the above
2911
   Getting Recoll to index the files is easy. You need to perform the above
2816
   alteration, and also to add data to the mimeconf file (typically in
2912
   alteration, and also to add data to the mimeconf file (typically in
...
...
2836
2932
2837
   The filter programming section describes in more detail how to write a
2933
   The filter programming section describes in more detail how to write a
2838
   filter.
2934
   filter.
2839
2935
2840
     ----------------------------------------------------------------------
2936
     ----------------------------------------------------------------------
2841
2842
7.5. The KDE Kicker Recoll applet
2843
2844
   The Recoll source tree contains the source code to the recoll_applet, a
2845
   small application derived from the find_applet. This can be used to add a
2846
   small Recoll launcher to the KDE panel.
2847
2848
   The applet is not automatically built with the main Recoll programs, nor
2849
   is it included with the main source distribution (because the KDE build
2850
   boilerplate makes it relatively big). You can download its source from the
2851
   recoll.org download page. Use the omnipotent configure;make;make install
2852
   incantation to build and install.
2853
2854
   You can then add the applet to the panel by right-clicking the panel and
2855
   choosing the Add applet entry.
2856
2857
   The recoll_applet has a small text window where you can type a Recoll
2858
   query (in query language form), and an icon which can be used to restrict
2859
   the search to certain types of files. It is quite primitive, and launches
2860
   a new recoll GUI instance every time (even if it is already running). You
2861
   may find it useful anyway.
2862
2863
     ----------------------------------------------------------------------