Switch to unified view

a/src/README b/src/README
...
...
6
6
7
  Jean-Francois Dockes
7
  Jean-Francois Dockes
8
8
9
   <jfd@recoll.org>
9
   <jfd@recoll.org>
10
10
11
   Copyright (c) 2005-2014 Jean-Francois Dockes
11
   Copyright (c) 2005-2015 Jean-Francois Dockes
12
12
13
   Permission is granted to copy, distribute and/or modify this document
13
   Permission is granted to copy, distribute and/or modify this document
14
   under the terms of the GNU Free Documentation License, Version 1.3 or any
14
   under the terms of the GNU Free Documentation License, Version 1.3 or any
15
   later version published by the Free Software Foundation; with no Invariant
15
   later version published by the Free Software Foundation; with no Invariant
16
   Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the
16
   Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the
17
   license can be found at the following location: GNU web site.
17
   license can be found at the following location: GNU web site.
18
18
19
   This document introduces full text search notions and describes the
19
   This document introduces full text search notions and describes the
20
   installation and use of the Recoll application. It currently describes
20
   installation and use of the Recoll application. This version describes
21
   Recoll 1.20.
21
   Recoll 1.21.
22
22
23
     ----------------------------------------------------------------------
23
     ----------------------------------------------------------------------
24
24
25
   Table of Contents
25
   Table of Contents
26
26
...
...
40
40
41
                             2.1.2. Configurations, multiple indexes
41
                             2.1.2. Configurations, multiple indexes
42
42
43
                             2.1.3. Document types
43
                             2.1.3. Document types
44
44
45
                             2.1.4. Indexing failures
46
45
                             2.1.4. Recovery
47
                             2.1.5. Recovery
46
48
47
                2.2. Index storage
49
                2.2. Index storage
48
50
49
                             2.2.1. Xapian index formats
51
                             2.2.1. Xapian index formats
50
52
...
...
105
                             3.1.12. Sorting search results and collapsing
107
                             3.1.12. Sorting search results and collapsing
106
                             duplicates
108
                             duplicates
107
109
108
                             3.1.13. Search tips, shortcuts
110
                             3.1.13. Search tips, shortcuts
109
111
112
                             3.1.14. Saving and restoring queries (1.21 and
113
                             later)
114
110
                             3.1.14. Customizing the search interface
115
                             3.1.15. Customizing the search interface
111
116
112
                3.2. Searching with the KDE KIO slave
117
                3.2. Searching with the KDE KIO slave
113
118
114
                             3.2.1. What's this
119
                             3.2.1. What's this
115
120
...
...
161
166
162
   5. Installation and configuration
167
   5. Installation and configuration
163
168
164
                5.1. Installing a binary copy
169
                5.1. Installing a binary copy
165
170
166
                             5.1.1. Installing through a package system
167
168
                             5.1.2. Installing a prebuilt Recoll
169
170
                5.2. Supporting packages
171
                5.2. Supporting packages
171
172
172
                5.3. Building from source
173
                5.3. Building from source
173
174
174
                             5.3.1. Prerequisites
175
                             5.3.1. Prerequisites
...
...
177
178
178
                             5.3.3. Installation
179
                             5.3.3. Installation
179
180
180
                5.4. Configuration overview
181
                5.4. Configuration overview
181
182
183
                             5.4.1. Environment variables
184
182
                             5.4.1. The main configuration file, recoll.conf
185
                             5.4.2. The main configuration file, recoll.conf
183
186
184
                             5.4.2. The fields file
187
                             5.4.3. The fields file
185
188
186
                             5.4.3. The mimemap file
189
                             5.4.4. The mimemap file
187
190
188
                             5.4.4. The mimeconf file
191
                             5.4.5. The mimeconf file
189
192
190
                             5.4.5. The mimeview file
193
                             5.4.6. The mimeview file
191
194
192
                             5.4.6. The ptrans file
195
                             5.4.7. The ptrans file
193
196
194
                             5.4.7. Examples of configuration adjustments
197
                             5.4.8. Examples of configuration adjustments
195
198
196
Chapter 1. Introduction
199
Chapter 1. Introduction
197
200
198
1.1. Giving it a try
201
1.1. Giving it a try
199
202
...
...
350
   documents will only be processed if they have been modified since the last
353
   documents will only be processed if they have been modified since the last
351
   run. On the first execution, all documents will need processing. A full
354
   run. On the first execution, all documents will need processing. A full
352
   index build can be forced later by specifying an option to the indexing
355
   index build can be forced later by specifying an option to the indexing
353
   command (recollindex -z or -Z).
356
   command (recollindex -z or -Z).
354
357
358
   recollindex skips files which caused an error during a previous pass. This
359
   is a performance optimization, and a new behaviour in version 1.21 (failed
360
   files were always retried by previous versions). The command line option
361
   -k can be set to retry failed files, for example after updating a filter.
362
355
   The following sections give an overview of different aspects of the
363
   The following sections give an overview of different aspects of the
356
   indexing processes and configuration, with links to detailed sections.
364
   indexing processes and configuration, with links to detailed sections.
365
366
   Depending on your data, temporary files may be needed during indexing,
367
   some of them possibly quite big. You can use the RECOLL_TMPDIR or TMPDIR
368
   environment variables to determine where they are created (the default is
369
   to use /tmp). Using TMPDIR has the nice property that it may also be taken
370
   into account by auxiliary commands executed by recollindex.
357
371
358
  2.1.1. Indexing modes
372
  2.1.1. Indexing modes
359
373
360
   Recoll indexing can be performed along two different modes:
374
   Recoll indexing can be performed along two different modes:
361
375
...
...
460
474
461
   excludedmimetypes or indexedmimetypes, can be set either by editing the
475
   excludedmimetypes or indexedmimetypes, can be set either by editing the
462
   main configuration file (recoll.conf), or from the GUI index configuration
476
   main configuration file (recoll.conf), or from the GUI index configuration
463
   tool.
477
   tool.
464
478
479
  2.1.4. Indexing failures
480
481
   Indexing may fail for some documents, for a number of reasons: a helper
482
   program may be missing, the document may be corrupt, we may fail to
483
   uncompress a file because no file system space is available, etc.
484
485
   Recoll versions prior to 1.21 always retried to index files which had
486
   previously caused an error. This guaranteed that anything that may have
487
   become indexable (for example because a helper had been installed) would
488
   be indexed. However this was bad for performance because some indexing
489
   failures may be quite costly (for example failing to uncompress a big file
490
   because of insufficient disk space).
491
492
   The indexer in Recoll versions 1.21 and later do not retry failed file by
493
   default. Retrying will only occur if an explicit option (-k) is set on the
494
   recollindex command line, or if a script executed when recollindex starts
495
   up says so. The script is defined by a configuration variable
496
   (checkneedretryindexscript), and makes a rather lame attempt at deciding
497
   if a helper command may have been installed, by checking if any of the
498
   common bin directories have changed.
499
465
  2.1.4. Recovery
500
  2.1.5. Recovery
466
501
467
   In the rare case where the index becomes corrupted (which can signal
502
   In the rare case where the index becomes corrupted (which can signal
468
   itself by weird search results or crashes), the index files need to be
503
   itself by weird search results or crashes), the index files need to be
469
   erased before restarting a clean indexing pass. Just delete the xapiandb
504
   erased before restarting a clean indexing pass. Just delete the xapiandb
470
   directory (see next section), or, alternatively, start the next
505
   directory (see next section), or, alternatively, start the next
...
...
783
   index first. This will not have the "clean start" aspect of -z, but the
818
   index first. This will not have the "clean start" aspect of -z, but the
784
   advantage is that the index will remain available for querying while it is
819
   advantage is that the index will remain available for querying while it is
785
   rebuilt, which can be a significant advantage if it is very big (some
820
   rebuilt, which can be a significant advantage if it is very big (some
786
   installations need days for a full index rebuild).
821
   installations need days for a full index rebuild).
787
822
823
   Option -k will force retrying files which previously failed to be indexed,
824
   for example because of a missing helper program.
825
788
   Of special interest also, maybe, are the -i and -f options. -i allows
826
   Of special interest also, maybe, are the -i and -f options. -i allows
789
   indexing an explicit list of files (given as command line parameters or
827
   indexing an explicit list of files (given as command line parameters or
790
   read on stdin). -f tells recollindex to ignore file selection parameters
828
   read on stdin). -f tells recollindex to ignore file selection parameters
791
   from the configuration. Together, these options allow building a custom
829
   from the configuration. Together, these options allow building a custom
792
   file selection process for some area of the file system, by adding the top
830
   file selection process for some area of the file system, by adding the top
...
...
865
903
866
   If you use the daemon completely out of an X11 session, you need to add
904
   If you use the daemon completely out of an X11 session, you need to add
867
   option -x to disable X11 session monitoring (else the daemon will not
905
   option -x to disable X11 session monitoring (else the daemon will not
868
   start).
906
   start).
869
907
870
   By default, the messages from the indexing daemon will be discarded. You
908
   By default, the messages from the indexing daemon will be setn to the same
909
   file as those from the interactive commands (logfilename). You may want to
871
   may want to change this by setting the daemlogfilename and daemloglevel
910
   change this by setting the daemlogfilename and daemloglevel configuration
872
   configuration parameters. Also the log file will only be truncated when
911
   parameters. Also the log file will only be truncated when the daemon
873
   the daemon starts. If the daemon runs permanently, the log file may grow
912
   starts. If the daemon runs permanently, the log file may grow quite big,
874
   quite big, depending on the log level.
913
   depending on the log level.
875
914
876
   When building Recoll, the real time indexing support can be customised
915
   When building Recoll, the real time indexing support can be customised
877
   during package configuration with the --with[out]-fam or
916
   during package configuration with the --with[out]-fam or
878
   --with[out]-inotify options. The default is currently to include inotify
917
   --with[out]-inotify options. The default is currently to include inotify
879
   monitoring on systems that support it, and, as of Recoll 1.17, gamin
918
   monitoring on systems that support it, and, as of Recoll 1.17, gamin
...
...
944
   printed is for east-asian languages (Chinese, Japanese, Korean). Words
983
   printed is for east-asian languages (Chinese, Japanese, Korean). Words
945
   composed of single or multiple characters should be entered separated by
984
   composed of single or multiple characters should be entered separated by
946
   white space in this case (they would typically be printed without white
985
   white space in this case (they would typically be printed without white
947
   space).
986
   space).
948
987
988
   Some searches can be quite complex, and you may want to re-use them later,
989
   perhaps with some tweaking. Recoll versions 1.21 and later can save and
990
   restore searches, using XML files. See Saving and restoring queries.
991
949
  3.1.1. Simple search
992
  3.1.1. Simple search
950
993
951
    1. Start the recoll program.
994
    1. Start the recoll program.
952
995
953
    2. Possibly choose a search mode: Any term, All terms, File name or Query
996
    2. Possibly choose a search mode: Any term, All terms, File name or Query
...
...
1370
  3.1.8. Complex/advanced search
1413
  3.1.8. Complex/advanced search
1371
1414
1372
   The advanced search dialog helps you build more complex queries without
1415
   The advanced search dialog helps you build more complex queries without
1373
   memorizing the search language constructs. It can be opened through the
1416
   memorizing the search language constructs. It can be opened through the
1374
   Tools menu or through the main toolbar.
1417
   Tools menu or through the main toolbar.
1418
1419
   Recoll keeps a history of searches. See Advanced search history.
1375
1420
1376
   The dialog has two tabs:
1421
   The dialog has two tabs:
1377
1422
1378
    1. The first tab lets you specify terms to search for, and permits
1423
    1. The first tab lets you specify terms to search for, and permits
1379
       specifying multiple clauses which are combined to build the search.
1424
       specifying multiple clauses which are combined to build the search.
...
...
1743
   Printing previews. Entering Ctrl-P in a preview window will print the
1788
   Printing previews. Entering Ctrl-P in a preview window will print the
1744
   currently displayed text.
1789
   currently displayed text.
1745
1790
1746
   Quitting. Entering Ctrl-Q almost anywhere will close the application.
1791
   Quitting. Entering Ctrl-Q almost anywhere will close the application.
1747
1792
1793
  3.1.14. Saving and restoring queries (1.21 and later)
1794
1795
   Both simple and advanced query dialogs save recent history, but the amount
1796
   is limited: old queries will eventually be forgotten. Also, important
1797
   queries may be difficult to find among others. This is why both types of
1798
   queries can also be explicitely saved to files, from the GUI menus: File
1799
   -> Save last query / Load last query
1800
1801
   The default location for saved queries is a subdirectory of the current
1802
   configuration directory, but saved queries are ordinary files and can be
1803
   written or moved anywhere.
1804
1805
   Some of the saved query parameters are part of the preferences (e.g.
1806
   autophrase or the active external indexes), and may differ when the query
1807
   is loaded from the time it was saved. In this case, Recoll will warn of
1808
   the differences, but will not change the user preferences.
1809
1748
  3.1.14. Customizing the search interface
1810
  3.1.15. Customizing the search interface
1749
1811
1750
   You can customize some aspects of the search interface by using the GUI
1812
   You can customize some aspects of the search interface by using the GUI
1751
   configuration entry in the Preferences menu.
1813
   configuration entry in the Preferences menu.
1752
1814
1753
   There are several tabs in the dialog, dealing with the interface itself,
1815
   There are several tabs in the dialog, dealing with the interface itself,
...
...
1910
   always implicitly active. If this is not desirable, you can set up your
1972
   always implicitly active. If this is not desirable, you can set up your
1911
   configuration so that it indexes, for example, an empty directory. An
1973
   configuration so that it indexes, for example, an empty directory. An
1912
   alternative indexer may also need to implement a way of purging the index
1974
   alternative indexer may also need to implement a way of purging the index
1913
   from stale data,
1975
   from stale data,
1914
1976
1915
    3.1.14.1. The result list format
1977
    3.1.15.1. The result list format
1978
1979
   Newer versions of Recoll (from 1.17) normally use WebKit HTML widgets for
1980
   the result list and the snippets window (this may be disabled at build
1981
   time). Total customisation is possible with full support for CSS and
1982
   Javascript. Conversely, there are limits to what you can do with the older
1983
   Qt QTextBrowser, but still, it is possible to decide what data each result
1984
   will contain, and how it will be displayed.
1916
1985
1917
   The result list presentation can be exhaustively customized by adjusting
1986
   The result list presentation can be exhaustively customized by adjusting
1918
   two elements:
1987
   two elements:
1919
1988
1920
     o The paragraph format
1989
     o The paragraph format
1921
1990
1922
     o HTML code inside the header section
1991
     o HTML code inside the header section. For versions 1.21 and later, this
1992
       is also used for the snippets window
1923
1993
1924
   These can be edited from the Result list tab of the GUI configuration.
1994
   The paragraph format and the header fragment can be edited from the Result
1995
   list tab of the GUI configuration.
1925
1996
1926
   Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
1997
   The header fragment is used both for the result list and the snippets
1927
   (this may be disabled at build time), and total customisation is possible
1998
   window. The snippets list is a table and has a snippets class attribute.
1928
   with full support for CSS and Javascript. Conversely, there are limits to
1999
   Each paragraph in the result list is a table, with class respar, but this
1929
   what you can do with the older Qt QTextBrowser, but still, it is possible
2000
   can be changed by editing the paragraph format.
1930
   to decide what data each result will contain, and how it will be
1931
   displayed.
1932
2001
1933
   No more detail will be given about the header part (only useful with the
1934
   WebKit build), if there are restrictions to what you can do, they are
1935
   beyond this author's HTML/CSS/Javascript abilities... There are a few
1936
   examples on the page about customising the result list on the Recoll web
2002
   There are a few examples on the page about customising the result list on
1937
   site.
2003
   the Recoll web site.
1938
2004
1939
      The paragraph format
2005
      The paragraph format
1940
2006
1941
   This is an arbitrary HTML string where the following printf-like %
2007
   This is an arbitrary HTML string where the following printf-like %
1942
   substitutions will be performed:
2008
   substitutions will be performed:
...
...
1995
   example candidate would be the recipient field which is generated by the
2061
   example candidate would be the recipient field which is generated by the
1996
   message input handlers.
2062
   message input handlers.
1997
2063
1998
   The default value for the paragraph format string is:
2064
   The default value for the paragraph format string is:
1999
2065
2000
 <img src="%I" align="left">%R %S %L &nbsp;&nbsp;<b>%T</b><br>
2066
     "<table class=\"respar\">\n"
2001
 %M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i>&nbsp;%i<br>
2067
     "<tr>\n"
2002
 %A %K
2068
     "<td><a href='%U'><img src='%I' width='64'></a></td>\n"
2069
     "<td>%L &nbsp;<i>%S</i> &nbsp;&nbsp;<b>%T</b><br>\n"
2070
     "<span style='white-space:nowrap'><i>%M</i>&nbsp;%D</span>&nbsp;&nbsp;&nbsp; <i>%U</i>&nbsp;%i<br>\n"
2071
     "%A %K</td>\n"
2072
     "</tr></table>\n"
2003
2073
2004
   You may, for example, try the following for a more web-like experience:
2074
   You may, for example, try the following for a more web-like experience:
2005
2075
2006
 <u><b><a href="P%N">%T</a></b></u><br>
2076
 <u><b><a href="P%N">%T</a></b></u><br>
2007
 %A<font color=#008000>%U - %S</font> - %L
2077
 %A<font color=#008000>%U - %S</font> - %L
...
...
2203
   or lennon and either live or unplugged but not potatoes (in any part of
2273
   or lennon and either live or unplugged but not potatoes (in any part of
2204
   the document).
2274
   the document).
2205
2275
2206
   An element is composed of an optional field specification, and a value,
2276
   An element is composed of an optional field specification, and a value,
2207
   separated by a colon (the field separator is the last colon in the
2277
   separated by a colon (the field separator is the last colon in the
2208
   element). Example: Eugenie, author:balzac, dc:title:grandet
2278
   element). Examples: Eugenie, author:balzac, dc:title:grandet
2279
   dc:title:"eugenie grandet"
2209
2280
2210
   The colon, if present, means "contains". Xesam defines other relations,
2281
   The colon, if present, means "contains". Xesam defines other relations,
2211
   which are mostly unsupported for now (except in special cases, described
2282
   which are mostly unsupported for now (except in special cases, described
2212
   further down).
2283
   further down).
2213
2284
...
...
2216
   Beatles OR Lennon. The OR must be entered literally (capitals), and it has
2287
   Beatles OR Lennon. The OR must be entered literally (capitals), and it has
2217
   priority over the AND associations: word1 word2 OR word3 means word1 AND
2288
   priority over the AND associations: word1 word2 OR word3 means word1 AND
2218
   (word2 OR word3) not (word1 AND word2) OR word3. Explicit parenthesis are
2289
   (word2 OR word3) not (word1 AND word2) OR word3. Explicit parenthesis are
2219
   not supported.
2290
   not supported.
2220
2291
2292
   As of Recoll 1.21, you can use parentheses to group elements, which will
2293
   sometimes make things clearer, and may allow expressing combinations which
2294
   would have been difficult otherwise.
2295
2221
   An element preceded by a - specifies a term that should not appear. Pure
2296
   An element preceded by a - specifies a term that should not appear.
2222
   negative queries are forbidden.
2223
2297
2224
   As usual, words inside quotes define a phrase (the order of words is
2298
   As usual, words inside quotes define a phrase (the order of words is
2225
   significant), so that title:"prejudice pride" is not the same as
2299
   significant), so that title:"prejudice pride" is not the same as
2226
   title:prejudice title:pride, and is unlikely to find a result.
2300
   title:prejudice title:pride, and is unlikely to find a result.
2227
2301
2302
   Words inside phrases and capitalized words are not stem-expanded.
2303
   Wildcards may be used anywhere inside a term. Specifying a wild-card on
2304
   the left of a term can produce a very slow search (or even an incorrect
2305
   one if the expansion is truncated because of excessive size). Also see
2306
   More about wildcards.
2307
2228
   To save you some typing, recent Recoll versions (1.20 and later) interpret
2308
   To save you some typing, recent Recoll versions (1.20 and later) interpret
2229
   a comma-separated list of terms as an AND list inside the field. Use slash
2309
   a comma-separated list of terms as an AND list inside the field. Use slash
2230
   characters ('/') for an OR list. No white space is allowed. So
2310
   characters ('/') for an OR list. No white space is allowed. So
2231
2311
2232
 author:john,lennon
2312
 author:john,lennon
...
...
2236
2316
2237
 author:john/ringo
2317
 author:john/ringo
2238
2318
2239
   would search for john or ringo.
2319
   would search for john or ringo.
2240
2320
2241
   Modifiers can be set on a phrase clause, for example to specify a
2321
   Modifiers can be set on a double-quote value, for example to specify a
2242
   proximity search (unordered). See the modifier section.
2322
   proximity search (unordered). See the modifier section. No space must
2323
   separate the final double-quote and the modifiers value, e.g. "two
2324
   one"po10
2243
2325
2244
   Recoll currently manages the following default fields:
2326
   Recoll currently manages the following default fields:
2245
2327
2246
     o title, subject or caption are synonyms which specify data to be
2328
     o title, subject or caption are synonyms which specify data to be
2247
       searched for in the document title or subject.
2329
       searched for in the document title or subject.
...
...
2353
       text/media/presentation/etc.). The classification of MIME types in
2435
       text/media/presentation/etc.). The classification of MIME types in
2354
       categories is defined in the Recoll configuration (mimeconf), and can
2436
       categories is defined in the Recoll configuration (mimeconf), and can
2355
       be modified or extended. The default category names are those which
2437
       be modified or extended. The default category names are those which
2356
       permit filtering results in the main GUI screen. Categories are OR'ed
2438
       permit filtering results in the main GUI screen. Categories are OR'ed
2357
       like MIME types above. This can't be negated with - either.
2439
       like MIME types above. This can't be negated with - either.
2358
2359
   Words inside phrases and capitalized words are not stem-expanded.
2360
   Wildcards may be used anywhere inside a term. Specifying a wild-card on
2361
   the left of a term can produce a very slow search (or even an incorrect
2362
   one if the expansion is truncated because of excessive size). Also see
2363
   More about wildcards.
2364
2440
2365
   The document input handlers used while indexing have the possibility to
2441
   The document input handlers used while indexing have the possibility to
2366
   create other fields with arbitrary names, and aliases may be defined in
2442
   create other fields with arbitrary names, and aliases may be defined in
2367
   the configuration, so that the exact field search possibilities may be
2443
   the configuration, so that the exact field search possibilities may be
2368
   different for you if someone took care of the customisation.
2444
   different for you if someone took care of the customisation.
...
...
3247
3323
3248
Chapter 5. Installation and configuration
3324
Chapter 5. Installation and configuration
3249
3325
3250
5.1. Installing a binary copy
3326
5.1. Installing a binary copy
3251
3327
3252
   There are three types of binary Recoll installations:
3328
   Recoll binary copies are always distributed as regular packages for your
3329
   system. They can be obtained either through the system's normal software
3330
   distribution framework (e.g. Debian/Ubuntu apt, FreeBSD ports, etc.), or
3331
   from some type of "backports" repository providing versions newer than the
3332
   standard ones, or found on the Recoll WEB site in some cases.
3253
3333
3254
     o Through your system normal software distribution framework (ie,
3334
   There used to exist another form of binary install, as pre-compiled source
3255
       Debian/Ubuntu apt, FreeBSD ports, etc.).
3335
   trees, but these are just less convenient than the packages and don't
3336
   exist any more.
3256
3337
3257
     o From a package downloaded from the Recoll web site.
3338
   The package management tools will usually automatically deal with hard
3339
   dependancies for packages obtained from a proper package repository. You
3340
   will have to deal with them by hand for downloaded packages (for example,
3341
   when dpkg complains about missing dependancies).
3258
3342
3259
     o From a prebuilt tree downloaded from the Recoll web site.
3260
3261
   In all cases, the strict software dependancies (ie on Xapian or iconv)
3262
   will be automatically satisfied, you should not have to worry about them.
3263
3264
   You will only have to check or install supporting applications for the
3343
   In all cases, you will have to check or install supporting applications
3265
   file types that you want to index beyond those that are natively processed
3344
   for the file types that you want to index beyond those that are natively
3266
   by Recoll (text, HTML, email files, and a few others).
3345
   processed by Recoll (text, HTML, email files, and a few others).
3267
3346
3268
   You should also maybe have a look at the configuration section (but this
3347
   You should also maybe have a look at the configuration section (but this
3269
   may not be necessary for a quick test with default parameters). Most
3348
   may not be necessary for a quick test with default parameters). Most
3270
   parameters can be more conveniently set from the GUI interface.
3349
   parameters can be more conveniently set from the GUI interface.
3271
3272
  5.1.1. Installing through a package system
3273
3274
   If you use a BSD-type port system or a prebuilt package (DEB, RPM,
3275
   manually or through the system software configuration utility), just
3276
   follow the usual procedure for your system.
3277
3278
  5.1.2. Installing a prebuilt Recoll
3279
3280
   The unpackaged binary versions on the Recoll web site are just compressed
3281
   tar files of a build tree, where only the useful parts were kept
3282
   (executables and sample configuration).
3283
3284
   The executable binary files are built with a static link to libxapian and
3285
   libiconv, to make installation easier (no dependencies).
3286
3287
   After extracting the tar file, you can proceed with installation as if you
3288
   had built the package from source (that is, just type make install). The
3289
   binary trees are built for installation to /usr/local.
3290
3350
3291
5.2. Supporting packages
3351
5.2. Supporting packages
3292
3352
3293
   Recoll uses external applications to index some file types. You need to
3353
   Recoll uses external applications to index some file types. You need to
3294
   install them for the file types that you wish to have indexed (these are
3354
   install them for the file types that you wish to have indexed (these are
...
...
3485
     o Of course the usual autoconf configure options, like --prefix apply.
3545
     o Of course the usual autoconf configure options, like --prefix apply.
3486
3546
3487
   Normal procedure:
3547
   Normal procedure:
3488
3548
3489
         cd recoll-xxx
3549
         cd recoll-xxx
3490
         configure
3550
         ./configure
3491
         make
3551
         make
3492
         (practices usual hardship-repelling invocations)
3552
         (practices usual hardship-repelling invocations)
3493
      
3553
      
3494
3554
3495
   There is little auto-configuration. The configure script will mainly link
3555
   There is little auto-configuration. The configure script will mainly link
...
...
3622
       handle multiple encodings in a single file. In this relatively
3682
       handle multiple encodings in a single file. In this relatively
3623
       unlikely case, you can edit the configuration file as two separate
3683
       unlikely case, you can edit the configuration file as two separate
3624
       text files with appropriate encodings, and concatenate them to create
3684
       text files with appropriate encodings, and concatenate them to create
3625
       the complete configuration.
3685
       the complete configuration.
3626
3686
3687
  5.4.1. Environment variables
3688
3689
   RECOLL_CONFDIR
3690
3691
           Defines the main configuration directory.
3692
3693
   RECOLL_TMPDIR, TMPDIR
3694
3695
           Locations for temporary files, in this order of priority. The
3696
           default if none of these is set is to use /tmp. Big temporary
3697
           files may be created during indexing, mostly for decompressing,
3698
           and also for processing, e.g. email attachments.
3699
3700
   RECOLL_CONFTOP, RECOLL_CONFMID
3701
3702
           Allow adding configuration directories with priorities below and
3703
           above the user directory (see above the Configuration overview
3704
           section for details).
3705
3706
   RECOLL_EXTRA_DBS, RECOLL_ACTIVE_EXTRA_DBS
3707
3708
           Help for setting up external indexes. See this paragraph for
3709
           explanations.
3710
3711
   RECOLL_DATADIR
3712
3713
           Defines replacement for the default location of Recoll data files,
3714
           normally found in, e.g., /usr/share/recoll).
3715
3716
   RECOLL_FILTERSDIR
3717
3718
           Defines replacement for the default location of Recoll filters,
3719
           normally found in, e.g., /usr/share/recoll/filters).
3720
3721
   ASPELL_PROG
3722
3723
           aspell program to use for creating the spelling dictionary. The
3724
           result has to be compatible with the libaspell which Recoll is
3725
           using.
3726
3727
   VARNAME
3728
3729
           Blabla
3730
3627
  5.4.1. The main configuration file, recoll.conf
3731
  5.4.2. The main configuration file, recoll.conf
3628
3732
3629
   recoll.conf is the main configuration file. It defines things like what to
3733
   recoll.conf is the main configuration file. It defines things like what to
3630
   index (top directories and things to ignore), and the default character
3734
   index (top directories and things to ignore), and the default character
3631
   set to use for document types which do not specify it internally.
3735
   set to use for document types which do not specify it internally.
3632
3736
...
...
3637
3741
3638
   Most of the following parameters can be changed from the Index
3742
   Most of the following parameters can be changed from the Index
3639
   Configuration menu in the recoll interface. Some can only be set by
3743
   Configuration menu in the recoll interface. Some can only be set by
3640
   editing the configuration file.
3744
   editing the configuration file.
3641
3745
3642
    5.4.1.1. Parameters affecting what documents we index:
3746
    5.4.2.1. Parameters affecting what documents we index:
3643
3747
3644
   topdirs
3748
   topdirs
3645
3749
3646
           Specifies the list of directories or files to index (recursively
3750
           Specifies the list of directories or files to index (recursively
3647
           for directories). You can use symbolic links as elements of this
3751
           for directories). You can use symbolic links as elements of this
...
...
3671
           hidden directories, and you probably want this indexed. One
3775
           hidden directories, and you probably want this indexed. One
3672
           possible solution is to have .* in skippedNames, and add things
3776
           possible solution is to have .* in skippedNames, and add things
3673
           like ~/.thunderbird or ~/.evolution in topdirs.
3777
           like ~/.thunderbird or ~/.evolution in topdirs.
3674
3778
3675
           Not even the file names are indexed for patterns in this list. See
3779
           Not even the file names are indexed for patterns in this list. See
3676
           the recoll_noindex variable in mimemap for an alternative approach
3780
           the noContentSuffixes variable for an alternative approach which
3677
           which indexes the file names.
3781
           indexes the file names.
3782
3783
   noContentSuffixes
3784
3785
           This is a list of file name endings (not wildcard expressions, nor
3786
           dot-delimited suffixes). Only the names of matching files will be
3787
           indexed (no attempt at MIME type identification, no decompression,
3788
           no content indexing). This can be redefined for subdirectories,
3789
           and edited from the GUI. The default value is:
3790
3791
 noContentSuffixes = .md5 .map \
3792
        .o .lib .dll .a .sys .exe .com \
3793
        .mpp .mpt .vsd \
3794
            .img .img.gz .img.bz2 .img.xz .image .image.gz .image.bz2 .image.xz \
3795
        .dat .bak .rdf .log.gz .log .db .msf .pid \
3796
        ,v ~ #
3678
3797
3679
   skippedPaths and daemSkippedPaths
3798
   skippedPaths and daemSkippedPaths
3680
3799
3681
           A space-separated list of patterns for paths of files or
3800
           A space-separated list of patterns for paths of files or
3682
           directories that should be skipped. There is no default in the
3801
           directories that should be skipped. There is no default in the
...
...
3792
3911
3793
           The path to the web indexing queue. This is hard-coded in the
3912
           The path to the web indexing queue. This is hard-coded in the
3794
           Firefox plugin as ~/.recollweb/ToIndex so there should be no need
3913
           Firefox plugin as ~/.recollweb/ToIndex so there should be no need
3795
           to change it.
3914
           to change it.
3796
3915
3797
    5.4.1.2. Parameters affecting how we generate terms:
3916
    5.4.2.2. Parameters affecting how we generate terms:
3798
3917
3799
   Changing some of these parameters will imply a full reindex. Also, when
3918
   Changing some of these parameters will imply a full reindex. Also, when
3800
   using multiple indexes, it may not make sense to search indexes that don't
3919
   using multiple indexes, it may not make sense to search indexes that don't
3801
   share the values for these parameters, because they usually affect both
3920
   share the values for these parameters, because they usually affect both
3802
   search and index operations.
3921
   search and index operations.
...
...
3967
 field2 = value for field2
4086
 field2 = value for field2
3968
                
4087
                
3969
4088
3970
           field1 and field2 will be set inside the document metadata.
4089
           field1 and field2 will be set inside the document metadata.
3971
4090
3972
    5.4.1.3. Parameters affecting where and how we store things:
4091
    5.4.2.3. Parameters affecting where and how we store things:
3973
4092
3974
   dbdir
4093
   dbdir
3975
4094
3976
           The name of the Xapian data directory. It will be created if
4095
           The name of the Xapian data directory. It will be created if
3977
           needed when the index is initialized. If this is not an absolute
4096
           needed when the index is initialized. If this is not an absolute
...
...
4026
           usage also depends on average document size. The default value is
4145
           usage also depends on average document size. The default value is
4027
           10, and it is probably a bit low. If your system usually has free
4146
           10, and it is probably a bit low. If your system usually has free
4028
           memory, you can try higher values between 20 and 80. In my
4147
           memory, you can try higher values between 20 and 80. In my
4029
           experience, values beyond 100 are always counterproductive.
4148
           experience, values beyond 100 are always counterproductive.
4030
4149
4031
    5.4.1.4. Parameters affecting multithread processing
4150
    5.4.2.4. Parameters affecting multithread processing
4032
4151
4033
   The Recoll indexing process recollindex can use multiple threads to speed
4152
   The Recoll indexing process recollindex can use multiple threads to speed
4034
   up indexing on multiprocessor systems. The work done to index files is
4153
   up indexing on multiprocessor systems. The work done to index files is
4035
   divided in several stages and some of the stages can be executed by
4154
   divided in several stages and some of the stages can be executed by
4036
   multiple threads. The stages are:
4155
   multiple threads. The stages are:
...
...
4089
   The following example would disable multithreading. Indexing will be
4208
   The following example would disable multithreading. Indexing will be
4090
   performed by a single thread.
4209
   performed by a single thread.
4091
4210
4092
 thrQSizes = -1 -1 -1
4211
 thrQSizes = -1 -1 -1
4093
4212
4094
    5.4.1.5. Miscellaneous parameters:
4213
    5.4.2.5. Miscellaneous parameters:
4095
4214
4096
   autodiacsens
4215
   autodiacsens
4097
4216
4098
           IF the index is not stripped, decide if we automatically trigger
4217
           IF the index is not stripped, decide if we automatically trigger
4099
           diacritics sensitivity if the search term has accented characters
4218
           diacritics sensitivity if the search term has accented characters
...
...
4118
   logfilename, daemlogfilename
4237
   logfilename, daemlogfilename
4119
4238
4120
           Where the messages should go. 'stderr' can be used as a special
4239
           Where the messages should go. 'stderr' can be used as a special
4121
           value, and is the default. The daemversion is specific to the
4240
           value, and is the default. The daemversion is specific to the
4122
           indexing monitor daemon.
4241
           indexing monitor daemon.
4242
4243
   checkneedretryindexscript
4244
4245
           This defines the name for a command executed by recollindex when
4246
           starting indexing. If the exit status of the command is 0,
4247
           recollindex retries to index all files which previously could not
4248
           be indexed because of data extraction errors. The default value is
4249
           a script which checks if any of the common bin directories have
4250
           changed (indicating that a helper program may have been
4251
           installed).
4123
4252
4124
   mondelaypatterns
4253
   mondelaypatterns
4125
4254
4126
           This allows specify wildcard path patterns (processed with
4255
           This allows specify wildcard path patterns (processed with
4127
           fnmatch(3) with 0 flag), to match files which change too often and
4256
           fnmatch(3) with 0 flag), to match files which change too often and
...
...
4209
           This allows definining location-related quirks for the mailbox
4338
           This allows definining location-related quirks for the mailbox
4210
           handler. Currently only the tbird flag is defined, and it should
4339
           handler. Currently only the tbird flag is defined, and it should
4211
           be set for directories which hold Thunderbird data, as their
4340
           be set for directories which hold Thunderbird data, as their
4212
           folder format is weird.
4341
           folder format is weird.
4213
4342
4214
  5.4.2. The fields file
4343
  5.4.3. The fields file
4215
4344
4216
   This file contains information about dynamic fields handling in Recoll.
4345
   This file contains information about dynamic fields handling in Recoll.
4217
   Some very basic fields have hard-wired behaviour, and, mostly, you should
4346
   Some very basic fields have hard-wired behaviour, and, mostly, you should
4218
   not change the original data inside the fields file. But you can create
4347
   not change the original data inside the fields file. But you can create
4219
   custom fields fitting your data and handle them just like they were native
4348
   custom fields fitting your data and handle them just like they were native
...
...
4280
 [mail]
4409
 [mail]
4281
 # Extract the X-My-Tag mail header, and use it internally with the
4410
 # Extract the X-My-Tag mail header, and use it internally with the
4282
 # mailmytag field name
4411
 # mailmytag field name
4283
 x-my-tag = mailmytag
4412
 x-my-tag = mailmytag
4284
4413
4285
    5.4.2.1. Extended attributes in the fields file
4414
    5.4.3.1. Extended attributes in the fields file
4286
4415
4287
   Recoll versions 1.19 and later process user extended file attributes as
4416
   Recoll versions 1.19 and later process user extended file attributes as
4288
   documents fields by default.
4417
   documents fields by default.
4289
4418
4290
   Attributes are processed as fields of the same name, after removing the
4419
   Attributes are processed as fields of the same name, after removing the
...
...
4292
4421
4293
   The [xattrtofields] section of the fields file allows specifying
4422
   The [xattrtofields] section of the fields file allows specifying
4294
   translations from extended attributes names to Recoll field names. An
4423
   translations from extended attributes names to Recoll field names. An
4295
   empty translation disables use of the corresponding attribute data.
4424
   empty translation disables use of the corresponding attribute data.
4296
4425
4297
  5.4.3. The mimemap file
4426
  5.4.4. The mimemap file
4298
4427
4299
   mimemap specifies the file name extension to MIME type mappings.
4428
   mimemap specifies the file name extension to MIME type mappings.
4300
4429
4301
   For file names without an extension, or with an unknown one, the system's
4430
   For file names without an extension, or with an unknown one, the system's
4302
   file -i command will be executed to determine the MIME type (this can be
4431
   file -i command will be executed to determine the MIME type (this can be
...
...
4305
   The mappings can be specified on a per-subtree basis, which may be useful
4434
   The mappings can be specified on a per-subtree basis, which may be useful
4306
   in some cases. Example: gaim logs have a .txt extension but should be
4435
   in some cases. Example: gaim logs have a .txt extension but should be
4307
   handled specially, which is possible because they are usually all located
4436
   handled specially, which is possible because they are usually all located
4308
   in one place.
4437
   in one place.
4309
4438
4310
   mimemap also has a recoll_noindex variable which is a list of suffixes.
4439
   The recoll_noindex mimemap variable has been moved to recoll.conf and
4311
   Matching files will be skipped (which avoids unnecessary decompressions or
4440
   renamed to noContentSuffixes, while keeping the same function, as of
4312
   file executions). This is partially redundant with skippedNames in the
4441
   Recoll version 1.21. For older Recoll versions, see the documentation for
4313
   main configuration file, with a few differences: it will not affect
4442
   noContentSuffixes but use recoll_noindex in mimemap.
4314
   directories, it cannot be made dependant on the file-system location (it
4315
   is a configuration-wide parameter), and the file names will still be
4316
   indexed (not even the file names are indexed for patterns in skippedNames.
4317
   recoll_noindex is used mostly for things known to be unindexable by a
4318
   given Recoll version. Having it there avoids cluttering the more
4319
   user-oriented and locally customized skippedNames.
4320
4443
4321
  5.4.4. The mimeconf file
4444
  5.4.5. The mimeconf file
4322
4445
4323
   mimeconf specifies how the different MIME types are handled for indexing,
4446
   mimeconf specifies how the different MIME types are handled for indexing,
4324
   and which icons are displayed in the recoll result lists.
4447
   and which icons are displayed in the recoll result lists.
4325
4448
4326
   Changing the parameters in the [index] section is probably not a good idea
4449
   Changing the parameters in the [index] section is probably not a good idea
...
...
4328
4451
4329
   The [icons] section allows you to change the icons which are displayed by
4452
   The [icons] section allows you to change the icons which are displayed by
4330
   recoll in the result lists (the values are the basenames of the png images
4453
   recoll in the result lists (the values are the basenames of the png images
4331
   inside the iconsdir directory (specified in recoll.conf).
4454
   inside the iconsdir directory (specified in recoll.conf).
4332
4455
4333
  5.4.5. The mimeview file
4456
  5.4.6. The mimeview file
4334
4457
4335
   mimeview specifies which programs are started when you click on an Open
4458
   mimeview specifies which programs are started when you click on an Open
4336
   link in a result list. Ie: HTML is normally displayed using firefox, but
4459
   link in a result list. Ie: HTML is normally displayed using firefox, but
4337
   you may prefer Konqueror, your openoffice.org program might be named
4460
   you may prefer Konqueror, your openoffice.org program might be named
4338
   oofice instead of openoffice etc.
4461
   oofice instead of openoffice etc.
...
...
4397
   In addition to the predefined values above, all strings like %(fieldname)
4520
   In addition to the predefined values above, all strings like %(fieldname)
4398
   will be replaced by the value of the field named fieldname for the
4521
   will be replaced by the value of the field named fieldname for the
4399
   document. This could be used in combination with field customisation to
4522
   document. This could be used in combination with field customisation to
4400
   help with opening the document.
4523
   help with opening the document.
4401
4524
4402
  5.4.6. The ptrans file
4525
  5.4.7. The ptrans file
4403
4526
4404
   ptrans specifies query-time path translations. These can be useful in
4527
   ptrans specifies query-time path translations. These can be useful in
4405
   multiple cases.
4528
   multiple cases.
4406
4529
4407
   The file has a section for any index which needs translations, either the
4530
   The file has a section for any index which needs translations, either the
...
...
4416
           [/path/to/additional/xapiandb]
4539
           [/path/to/additional/xapiandb]
4417
           /server/volume1/docdir = /net/server/volume1/docdir
4540
           /server/volume1/docdir = /net/server/volume1/docdir
4418
           /server/volume2/docdir = /net/server/volume2/docdir
4541
           /server/volume2/docdir = /net/server/volume2/docdir
4419
        
4542
        
4420
4543
4421
  5.4.7. Examples of configuration adjustments
4544
  5.4.8. Examples of configuration adjustments
4422
4545
4423
    5.4.7.1. Adding an external viewer for an non-indexed type
4546
    5.4.8.1. Adding an external viewer for an non-indexed type
4424
4547
4425
   Imagine that you have some kind of file which does not have indexable
4548
   Imagine that you have some kind of file which does not have indexable
4426
   content, but for which you would like to have a functional Open link in
4549
   content, but for which you would like to have a functional Open link in
4427
   the result list (when found by file name). The file names end in .blob and
4550
   the result list (when found by file name). The file names end in .blob and
4428
   can be displayed by application blobviewer.
4551
   can be displayed by application blobviewer.
...
...
4448
   MIME type which it already knows, you would just need to edit mimeview.
4571
   MIME type which it already knows, you would just need to edit mimeview.
4449
   The entries you add in your personal file override those in the central
4572
   The entries you add in your personal file override those in the central
4450
   configuration, which you do not need to alter. mimeview can also be
4573
   configuration, which you do not need to alter. mimeview can also be
4451
   modified from the Gui.
4574
   modified from the Gui.
4452
4575
4453
    5.4.7.2. Adding indexing support for a new file type
4576
    5.4.8.2. Adding indexing support for a new file type
4454
4577
4455
   Let us now imagine that the above .blob files actually contain indexable
4578
   Let us now imagine that the above .blob files actually contain indexable
4456
   text and that you know how to extract it with a command line program.
4579
   text and that you know how to extract it with a command line program.
4457
   Getting Recoll to index the files is easy. You need to perform the above
4580
   Getting Recoll to index the files is easy. You need to perform the above
4458
   alteration, and also to add data to the mimeconf file (typically in
4581
   alteration, and also to add data to the mimeconf file (typically in