Switch to unified view

a/src/README b/src/README
...
...
44
44
45
                             2.2.1. Xapian index formats
45
                             2.2.1. Xapian index formats
46
46
47
                             2.2.2. Security aspects
47
                             2.2.2. Security aspects
48
48
49
                2.3. Indexing configuration
49
                2.3. Index configuration
50
50
51
                             2.3.1. Index case and diacritics sensitivity
52
51
                             2.3.1. The indexing configuration GUI
53
                             2.3.2. The index configuration GUI
52
54
53
                2.4. Using Beagle WEB browser plugins
55
                2.4. Using Beagle WEB browser plugins
54
56
55
                2.5. Periodic indexing
57
                2.5. Periodic indexing
56
58
...
...
100
102
101
                3.4. The query language
103
                3.4. The query language
102
104
103
                             3.4.1. Modifiers
105
                             3.4.1. Modifiers
104
106
107
                3.5. Search case and diacritics sensitivity
108
105
                3.5. Anchored searches and wildcards
109
                3.6. Anchored searches and wildcards
106
110
107
                             3.5.1. More about wildcards
111
                             3.6.1. More about wildcards
108
112
109
                             3.5.2. Anchored searches
113
                             3.6.2. Anchored searches
110
114
111
                3.6. Desktop integration
115
                3.7. Desktop integration
112
116
113
                             3.6.1. Hotkeying recoll
117
                             3.7.1. Hotkeying recoll
114
118
115
                             3.6.2. The KDE Kicker Recoll applet
119
                             3.7.2. The KDE Kicker Recoll applet
116
120
117
                3.7. Multiple databases
121
                3.8. Multiple databases
118
122
119
   4. Programming interface
123
   4. Programming interface
120
124
121
                4.1. Writing a document filter
125
                4.1. Writing a document filter
122
126
123
                             4.1.1. Simple filters
127
                             4.1.1. Simple filters
124
128
125
                             4.1.2. Telling Recoll about the filter
129
                             4.1.2. Telling Recoll about the filter
126
130
127
                             4.1.3. Filter HTML output
131
                             4.1.3. Filter HTML output
132
133
                             4.1.4. Page numbers
128
134
129
                4.2. Field data processing
135
                4.2. Field data processing
130
136
131
                4.3. API
137
                4.3. API
132
138
...
...
248
   Stemming is the process by which Recoll reduces words to their radicals so
254
   Stemming is the process by which Recoll reduces words to their radicals so
249
   that searching does not depend, for example, on a word being singular or
255
   that searching does not depend, for example, on a word being singular or
250
   plural (floor, floors), or on a verb tense (flooring, floored). Because
256
   plural (floor, floors), or on a verb tense (flooring, floored). Because
251
   the mechanisms used for stemming depend on the specific grammatical rules
257
   the mechanisms used for stemming depend on the specific grammatical rules
252
   for each language, there is a separate stemmer module for most common
258
   for each language, there is a separate stemmer module for most common
253
   languages where stemming makes sense. Storing documents written in
259
   languages where stemming makes sense.
254
   different languages in the same index is possible, and commonly done. In
260
255
   this situation, you can specify several stemming languages for the index.
256
   Recoll stores the unstemmed versions of terms in the main index and uses
261
   Recoll stores the unstemmed versions of terms in the main index and uses
257
   auxiliary databases for term expansion (one for each stemming language),
262
   auxiliary databases for term expansion (one for each stemming language),
258
   which means that you can switch stemming languages between searches, or
263
   which means that you can switch stemming languages between searches, or
259
   add a language without needing a full reindex. Recoll currently makes no
264
   add a language without needing a full reindex.
260
   attempt at automatic language recognition, which means that the stemmer
265
261
   will sometimes be applied to terms from other languages with potentially
266
   Storing documents written in different languages in the same index is
262
   strange results. In practise, even if this introduces possibilities of
267
   possible, and commonly done. In this situation, you can specify several
263
   confusion, this approach has been proven quite useful, and, awaiting the
268
   stemming languages for the index.
264
   addition of an automatic language recognition module to Recoll, it is much
269
265
   less cumbersome than separating your documents according to what language
270
   Recoll currently makes no attempt at automatic language recognition, which
266
   they are written in.
271
   means that the stemmer will sometimes be applied to terms from other
272
   languages with potentially strange results. In practise, even if this
273
   introduces possibilities of confusion, this approach has been proven quite
274
   useful, and, awaiting the addition of an automatic language recognition
275
   module to Recoll, it is much less cumbersome than separating your
276
   documents according to what language they are written in.
277
278
   Before version 1.18, Recoll always stripped most accents and diacritics
279
   from terms, and converted them to lower case before storing them in the
280
   index. As a consequence, it was impossible to search for a particular
281
   capitalization of a term (US / us), or to discriminate two terms based on
282
   diacritics (sake / sake, mate / mate).
283
284
   As of version 1.18, Recoll can optionally store the raw terms, without
285
   accent stripping or case conversion. Expansions necessary for searches
286
   insensitive to case and/or diacritics are then performed when searching.
287
   This is described in more detail in the section about index case and
288
   diacritics sensitivity.
267
289
268
   Recoll has many parameters which define exactly what to index, and how to
290
   Recoll has many parameters which define exactly what to index, and how to
269
   classify and decode the source documents. These are kept in configuration
291
   classify and decode the source documents. These are kept in configuration
270
   files. A default configuration is copied into a standard location (usually
292
   files. A default configuration is copied into a standard location (usually
271
   something like /usr/[local/]share/recoll/examples) during installation.
293
   something like /usr/[local/]share/recoll/examples) during installation.
...
...
350
   search precision.
372
   search precision.
351
373
352
   The generated indexes can be queried concurrently in a transparent manner.
374
   The generated indexes can be queried concurrently in a transparent manner.
353
375
354
   For index generation, multiple configurations are totally independant from
376
   For index generation, multiple configurations are totally independant from
355
   each other. When multiple indexes are used for searches, some parameters
377
   each other. When multiple indexes need to be used for a single search,
356
   should be consistent among the configurations.
378
   some parameters should be consistent among the configurations.
357
379
358
     ----------------------------------------------------------------------
380
     ----------------------------------------------------------------------
359
381
360
  2.1.3. Document types
382
  2.1.3. Document types
361
383
...
...
478
   need for your index, set the directory and files access modes
500
   need for your index, set the directory and files access modes
479
   appropriately, and also maybe adjust the umask used during index updates.
501
   appropriately, and also maybe adjust the umask used during index updates.
480
502
481
     ----------------------------------------------------------------------
503
     ----------------------------------------------------------------------
482
504
483
2.3. Indexing configuration
505
2.3. Index configuration
484
506
485
   Variables set inside the Recoll configuration files control which areas of
507
   Variables set inside the Recoll configuration files control which areas of
486
   the file system are indexed, and how files are processed. These variables
508
   the file system are indexed, and how files are processed. These variables
487
   can be set either by editing the text files or using the dialogs in the
509
   can be set either by editing the text files or using the dialogs in the
488
   recoll GUI.
510
   recoll GUI.
...
...
504
   (ie: pdf, postscript, ms-word...) are described in the external packages
526
   (ie: pdf, postscript, ms-word...) are described in the external packages
505
   section.
527
   section.
506
528
507
     ----------------------------------------------------------------------
529
     ----------------------------------------------------------------------
508
530
531
  2.3.1. Index case and diacritics sensitivity
532
533
   As of Recoll version 1.18 you have a choice of building an index with
534
   terms stripped of character case and diacritics, or one with raw terms.
535
   For a source term of Resume, the former will store resume, the latter
536
   Resume.
537
538
   Each type of index allows performing searches insensitive to case and
539
   diacritics: with a raw index, the user entry will be expanded to match all
540
   case and diacritics variations present in the index. With a stripped
541
   index, the search term will be stripped before searching.
542
543
   A raw index allows for another possibility which a stripped index cannot
544
   offer: using case and diacritics to discriminate between terms, returning
545
   different results when searching for US and us or resume and resume. Read
546
   the section about search case and diacritics sensitivity for more details.
547
548
   The type of index to be created is controlled by the indexStripChars
549
   configuration variable which can only be changed by editing the
550
   configuration file. Any change implies an index reset (not automated by
551
   Recoll), and all indexes in a search must be set in the same way (again,
552
   not checked by Recoll).
553
554
   If the indexStripChars is not set, Recoll 1.18 creates a stripped index by
555
   default, for compatibility with previous versions.
556
557
   As a cost for added capability, a raw index will be slightly bigger than a
558
   stripped one (around 10%). Also, searches will be more complex, so
559
   probably slightly slower, and the feature is still young, and a certain
560
   amount of weirdness cannot be excluded.
561
562
     ----------------------------------------------------------------------
563
509
  2.3.1. The indexing configuration GUI
564
  2.3.2. The index configuration GUI
510
565
511
   Most parameters for a given indexing configuration can be set from a
566
   Most parameters for a given index configuration can be set from a recoll
512
   recoll GUI running on this configuration (either as default, or by setting
567
   GUI running on this configuration (either as default, or by setting
513
   RECOLL_CONFDIR or the -c option.)
568
   RECOLL_CONFDIR or the -c option.)
514
569
515
   The interface is started from the Preferences->Indexing Configuration menu
570
   The interface is started from the Preferences->Index Configuration menu
516
   entry. It is divided in three tabs, Global parameters, Local parameters,
571
   entry. It is divided in four tabs, Global parameters, Local parameters,
517
   and Beagle web history, which is explained in the next section.
572
   Beagle web history (which is explained in the next section) and Search
573
   parameters.
518
574
519
   The first tab allows setting global variables, like the lists of top
575
   The Global parameters tab allows setting global variables, like the lists
520
   directories, skipped paths, or stemming languages.
576
   of top directories, skipped paths, or stemming languages.
521
577
522
   The second tab allows setting variables that can be redefined for
578
   The Local parameters tab allows setting variables that can be redefined
523
   subdirectories. This second tab has an initially empty list of
579
   for subdirectories. This second tab has an initially empty list of
524
   customisation directories, to which you can add. The variables are then
580
   customisation directories, to which you can add. The variables are then
525
   set for the currently selected directory (or at the top level if the empty
581
   set for the currently selected directory (or at the top level if the empty
526
   line is selected).
582
   line is selected).
583
584
   The Search parameters section defines parameters which are used at query
585
   time, but are global to an index and affect all search tools, not only the
586
   GUI.
527
587
528
   The meaning for most entries in the interface is self-evident and
588
   The meaning for most entries in the interface is self-evident and
529
   documented by a ToolTip popup on the text label. For more detail, you will
589
   documented by a ToolTip popup on the text label. For more detail, you will
530
   need to refer to the configuration section of this guide.
590
   need to refer to the configuration section of this guide.
531
591
...
...
548
   still use the Firefox plugin, which is written in Javascript and
608
   still use the Firefox plugin, which is written in Javascript and
549
   completely independant of C#, Beagle, Lucene..., and set Recoll to process
609
   completely independant of C#, Beagle, Lucene..., and set Recoll to process
550
   the Beagle queue directory. This supposes that Beagle is not running, else
610
   the Beagle queue directory. This supposes that Beagle is not running, else
551
   both programs will fight for the same files.
611
   both programs will fight for the same files.
552
612
553
   This feature can be enabled in the GUI indexing configuration panel, or by
613
   This feature can be enabled in the GUI Index configuration panel, or by
554
   editing the configuration file (set processbeaglequeue to 1).
614
   editing the configuration file (set processbeaglequeue to 1).
555
615
556
   There are more recent instructions about how to find and install the
616
   There are more recent instructions about how to find and install the
557
   Firefox extension on the Recoll wiki.
617
   Firefox extension on the Recoll wiki.
558
618
...
...
853
   single preview window by typing Shift+ArrowUp/Down in the window).
913
   single preview window by typing Shift+ArrowUp/Down in the window).
854
914
855
   Clicking the Open link will attempt to start an external viewer. The
915
   Clicking the Open link will attempt to start an external viewer. The
856
   viewer for each document type can be configured through the user
916
   viewer for each document type can be configured through the user
857
   preferences dialog, or by editing the mimeview configuration file. You can
917
   preferences dialog, or by editing the mimeview configuration file. You can
858
   also check the Use desktop preferences option in the user preferences
918
   also check the Use desktop preferences option in the GUI preferences
859
   dialog to use the desktop defaults for all documents. This is probably the
919
   dialog to use the desktop defaults for all documents. This is probably the
860
   best option if you are using a well configured Gnome or KDE desktop.
920
   best option if you are using a well configured Gnome or KDE desktop.
861
921
862
   The Preview and Open edit links may not be present for all entries,
922
   The Preview and Open edit links may not be present for all entries,
863
   meaning that Recoll has no configured way to preview a given file type
923
   meaning that Recoll has no configured way to preview a given file type
...
...
901
     * Find similar
961
     * Find similar
902
962
903
     * Preview Parent document
963
     * Preview Parent document
904
964
905
     * Open Parent document
965
     * Open Parent document
966
967
     * Open Snippets Window
906
968
907
   The Preview and Open entries do the same thing as the corresponding links.
969
   The Preview and Open entries do the same thing as the corresponding links.
908
970
909
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
971
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
910
   for later pasting.
972
   for later pasting.
...
...
927
   appear for an email which is part of an mbox folder file, but that you
989
   appear for an email which is part of an mbox folder file, but that you
928
   can't actually visualize the folder (there will be an error dialog if you
990
   can't actually visualize the folder (there will be an error dialog if you
929
   try). Recoll is unfortunately not yet smart enough to disable the entry in
991
   try). Recoll is unfortunately not yet smart enough to disable the entry in
930
   this case. In other cases, the Open option makes sense, for example to
992
   this case. In other cases, the Open option makes sense, for example to
931
   start a chm viewer on the parent document for a help page.
993
   start a chm viewer on the parent document for a help page.
994
995
   The Open Snippets Window entry will only appear for documents which
996
   support page breaks (typically PDF, Postscript, DVI). The snippets window
997
   lists extracts from the document, taken around search terms occurrences,
998
   along with the corresponding page number, as links which can be used to
999
   start the native viewer on the appropriate page. If the viewer supports
1000
   it, its search function will also be primed with one of the search terms.
932
1001
933
     ----------------------------------------------------------------------
1002
     ----------------------------------------------------------------------
934
1003
935
  3.1.3. The result table
1004
  3.1.3. The result table
936
1005
...
...
1426
       the xdg-open utility will be used to open files when you click the
1495
       the xdg-open utility will be used to open files when you click the
1427
       Open link in the result list, instead of the application defined in
1496
       Open link in the result list, instead of the application defined in
1428
       mimeview. xdg-open will in term use your desktop preferences to choose
1497
       mimeview. xdg-open will in term use your desktop preferences to choose
1429
       an appropriate application.
1498
       an appropriate application.
1430
1499
1500
     * Exceptions: when using the desktop preferences for opening documents,
1501
       these are mime types that will still be opened according to Recoll
1502
       preferences. This is useful for passing parameters like page numbers
1503
       or search strings to applications that support them (e.g. evince).
1504
1431
     * Choose editor applications this will let you choose the command
1505
     * Choose editor applications this will let you choose the command
1432
       started by the Open links inside the result list, for specific
1506
       started by the Open links inside the result list, for specific
1433
       document types.
1507
       document types.
1434
1508
1435
     * Display category filter as toolbar... this will let you choose if the
1509
     * Display category filter as toolbar... this will let you choose if the
...
...
1566
   substitutions will be performed:
1640
   substitutions will be performed:
1567
1641
1568
     * %A. Abstract
1642
     * %A. Abstract
1569
1643
1570
     * %D. Date
1644
     * %D. Date
1645
1646
     * %E. Precooked Snippets link (will only appear for documents indexed
1647
       with page numbers)
1571
1648
1572
     * %I. Icon image name. This is normally determined from the mime type.
1649
     * %I. Icon image name. This is normally determined from the mime type.
1573
       The associations are defined inside the mimeconf configuration file.
1650
       The associations are defined inside the mimeconf configuration file.
1574
       If a thumbnail for the file is found at the standard Freedesktop
1651
       If a thumbnail for the file is found at the standard Freedesktop
1575
       location, this will be displayed instead.
1652
       location, this will be displayed instead.
...
...
1824
     * ext specifies the file name extension (Ex: ext:html)
1901
     * ext specifies the file name extension (Ex: ext:html)
1825
1902
1826
   The field syntax also supports a few field-like, but special, criteria:
1903
   The field syntax also supports a few field-like, but special, criteria:
1827
1904
1828
     * dir for filtering the results on file location (Ex:
1905
     * dir for filtering the results on file location (Ex:
1829
       dir:/home/me/somedir). -dir also works to find results out of the
1906
       dir:/home/me/somedir). -dir also works to find results not in the
1830
       specified directory, only after release 1.15.8. A tilde inside the
1907
       specified directory (release >= 1.15.8). A tilde inside the value will
1831
       value will be expanded to the home directory. dir is not a regular
1908
       be expanded to the home directory. Wildcards will not be expanded. You
1832
       field and only one value makes sense in a query (you can't use
1909
       cannot use OR with dir clauses (this restriction may go away in the
1833
       dir:dir1 OR dir:dir2). Relative paths make sense, for example,
1910
       future).
1834
       dir:share/doc would match either /usr/share/doc or
1911
1835
       /usr/local/share/doc
1912
       Relative paths also make sense, for example, dir:share/doc would match
1913
       either /usr/share/doc or /usr/local/share/doc
1914
1915
       Several dir clauses can be specified, both positive and negative. For
1916
       example the following makes sense:
1917
1918
 dir:recoll dir:src -dir:utils -dir:common
1919
            
1920
1921
       This would select results which have both recoll and src in the path
1922
       (in any order), and which have not either utils or common.
1923
1924
       Another special aspect of dir clauses is that the values in the index
1925
       are not transcoded to UTF-8, and never lower-cased or unaccented, but
1926
       stored as binary. This means that you need to enter the values in the
1927
       exact lower or upper case, and that searches for names with diacritics
1928
       may sometimes be impossible because of character set conversion
1929
       issues. Non-ASCII UNIX file paths are an unending source of trouble
1930
       and are best avoided.
1931
1932
       You need to use double-quotes around the path value if it contains
1933
       space characters.
1836
1934
1837
     * size for filtering the results on file size. Example: size<10000. You
1935
     * size for filtering the results on file size. Example: size<10000. You
1838
       can use <, > or = as operators. You can specify a range like the
1936
       can use <, > or = as operators. You can specify a range like the
1839
       following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
1937
       following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
1840
       used as (decimal) multipliers. Ex: size>1k to search for files bigger
1938
       used as (decimal) multipliers. Ex: size>1k to search for files bigger
...
...
1911
       the default is 10.
2009
       the default is 10.
1912
2010
1913
     * p can be used to turn the default phrase search into a proximity one
2011
     * p can be used to turn the default phrase search into a proximity one
1914
       (unordered). Example:"order any in"p
2012
       (unordered). Example:"order any in"p
1915
2013
2014
     * C will turn on case sensitivity (if the index supports it).
2015
2016
     * D will turn on diacritics sensitivity (if the index supports it).
2017
1916
     * A weight can be specified for a query element by specifying a decimal
2018
     * A weight can be specified for a query element by specifying a decimal
1917
       value at the start of the modifiers. Example: "Important"2.5.
2019
       value at the start of the modifiers. Example: "Important"2.5.
1918
2020
1919
     ----------------------------------------------------------------------
2021
     ----------------------------------------------------------------------
1920
2022
2023
3.5. Search case and diacritics sensitivity
2024
2025
   For Recoll versions 1.18 and later, and when working with a raw index (not
2026
   the default), searches can be made sensitive to character case and
2027
   diacritics. How this happens is controlled by configuration variables and
2028
   what search data is entered.
2029
2030
   The general default is that searches are insensitive to case and
2031
   diacritics. An entry of resume will match any of Resume, RESUME, resume,
2032
   Resume etc.
2033
2034
   Two configuration variables can automate switching on sensitivity:
2035
2036
   autodiacsens
2037
2038
           If this is set, search sensitivity to diacritics will be turned on
2039
           as soon as an accented character exists in a search term. When the
2040
           variable is set to true, resume will start a
2041
           diacritics-unsensitive search, but resume will be matched exactly.
2042
           The default value is false.
2043
2044
   autocasesens
2045
2046
           If this is set, search sensitivity to character case will be
2047
           turned on as soon as an upper-case character exists in a search
2048
           term except for the first one. When the variable is set to true,
2049
           us or Us will start a diacritics-unsensitive search, but US will
2050
           be matched exactly. The default value is true (contrary to
2051
           autodiacsens).
2052
2053
   As in the past, capitalizing the first letter of a word will turn off its
2054
   stem expansion and have no effect on case-sensitivity.
2055
2056
   You can also explicitely activate case and diacritics sensitivity by using
2057
   modifiers with the query language. C will make the term case-sensitive,
2058
   and D will make it diacritics-sensitive. Examples:
2059
2060
         "us"C
2061
   
2062
2063
   will search for the term us exactly (Us will not be a match).
2064
2065
         "resume"D
2066
      
2067
2068
   will search for the term resume exactly (resume will not be a match).
2069
2070
   When either case or diacritics sensitivity is activated, stem expansion is
2071
   turned off. Having both does not make much sense.
2072
2073
     ----------------------------------------------------------------------
2074
1921
3.5. Anchored searches and wildcards
2075
3.6. Anchored searches and wildcards
1922
2076
1923
   Some special characters are interpreted by Recoll in search strings to
2077
   Some special characters are interpreted by Recoll in search strings to
1924
   expand or specialize the search. Wildcards expand a root term in
2078
   expand or specialize the search. Wildcards expand a root term in
1925
   controlled ways. Anchor characters can restrict a search to succeed only
2079
   controlled ways. Anchor characters can restrict a search to succeed only
1926
   if the match is found at or near the beginning of the document or one of
2080
   if the match is found at or near the beginning of the document or one of
1927
   its fields.
2081
   its fields.
1928
2082
1929
     ----------------------------------------------------------------------
2083
     ----------------------------------------------------------------------
1930
2084
1931
  3.5.1. More about wildcards
2085
  3.6.1. More about wildcards
1932
2086
1933
   All words entered in Recoll search fields will be processed for wildcard
2087
   All words entered in Recoll search fields will be processed for wildcard
1934
   expansion before the request is finally executed.
2088
   expansion before the request is finally executed.
1935
2089
1936
   The wildcard characters are:
2090
   The wildcard characters are:
...
...
1957
       expansion will produce better results than an ending * (stem expansion
2111
       expansion will produce better results than an ending * (stem expansion
1958
       is turned off when any wildcard character appears in the term).
2112
       is turned off when any wildcard character appears in the term).
1959
2113
1960
     ----------------------------------------------------------------------
2114
     ----------------------------------------------------------------------
1961
2115
1962
  3.5.2. Anchored searches
2116
  3.6.2. Anchored searches
1963
2117
1964
   Two characters are used to specify that a search hit should occur at the
2118
   Two characters are used to specify that a search hit should occur at the
1965
   beginning or at the end of the text. ^ at the beginning of a term or
2119
   beginning or at the end of the text. ^ at the beginning of a term or
1966
   phrase constrains the search to happen at the start, $ at the end force it
2120
   phrase constrains the search to happen at the start, $ at the end force it
1967
   to happen at the end.
2121
   to happen at the end.
...
...
1982
   example, bla bla my unexpected term at the beginning of the text would be
2136
   example, bla bla my unexpected term at the beginning of the text would be
1983
   a match for "^my term"o5.
2137
   a match for "^my term"o5.
1984
2138
1985
     ----------------------------------------------------------------------
2139
     ----------------------------------------------------------------------
1986
2140
1987
3.6. Desktop integration
2141
3.7. Desktop integration
1988
2142
1989
   Being independant of the desktop type has its drawbacks: Recoll desktop
2143
   Being independant of the desktop type has its drawbacks: Recoll desktop
1990
   integration is minimal. Here follow a few things that may help.
2144
   integration is minimal. Here follow a few things that may help.
1991
2145
1992
     ----------------------------------------------------------------------
2146
     ----------------------------------------------------------------------
1993
2147
1994
  3.6.1. Hotkeying recoll
2148
  3.7.1. Hotkeying recoll
1995
2149
1996
   It is surprisingly convenient to be able to show or hide the Recoll GUI
2150
   It is surprisingly convenient to be able to show or hide the Recoll GUI
1997
   with a single keystroke. Recoll comes with a small Python script, based on
2151
   with a single keystroke. Recoll comes with a small Python script, based on
1998
   the libwnck window manager interface library, which will allow you to do
2152
   the libwnck window manager interface library, which will allow you to do
1999
   just this. The detailed instructions are on this wiki page.
2153
   just this. The detailed instructions are on this wiki page.
2000
2154
2001
     ----------------------------------------------------------------------
2155
     ----------------------------------------------------------------------
2002
2156
2003
  3.6.2. The KDE Kicker Recoll applet
2157
  3.7.2. The KDE Kicker Recoll applet
2004
2158
2005
   The Recoll source tree contains the source code to the recoll_applet, a
2159
   The Recoll source tree contains the source code to the recoll_applet, a
2006
   small application derived from the find_applet. This can be used to add a
2160
   small application derived from the find_applet. This can be used to add a
2007
   small Recoll launcher to the KDE panel.
2161
   small Recoll launcher to the KDE panel.
2008
2162
...
...
2021
   a new recoll GUI instance every time (even if it is already running). You
2175
   a new recoll GUI instance every time (even if it is already running). You
2022
   may find it useful anyway.
2176
   may find it useful anyway.
2023
2177
2024
     ----------------------------------------------------------------------
2178
     ----------------------------------------------------------------------
2025
2179
2026
3.7. Multiple databases
2180
3.8. Multiple databases
2027
2181
2028
   Multiple Recoll databases or indexes can be created by using several
2182
   Multiple Recoll databases or indexes can be created by using several
2029
   configuration directories which are usually set to index different areas
2183
   configuration directories which are usually set to index different areas
2030
   of the file system. A specific index can be selected for updating or
2184
   of the file system. A specific index can be selected for updating or
2031
   searching, using the RECOLL_CONFDIR environment variable or the -c option
2185
   searching, using the RECOLL_CONFDIR environment variable or the -c option
...
...
2211
2365
2212
 <meta name="somefield" content="Some textual data" />
2366
 <meta name="somefield" content="Some textual data" />
2213
2367
2214
   See the following section for details about configuring how field data is
2368
   See the following section for details about configuring how field data is
2215
   processed by the indexer.
2369
   processed by the indexer.
2370
2371
     ----------------------------------------------------------------------
2372
2373
  4.1.4. Page numbers
2374
2375
   The indexer will interpret ^L characters in the filter output as
2376
   indicating page breaks, and will record them. At query time, this allows
2377
   starting a viewer on the right page for a hit or a snippet. Currently,
2378
   only the PDF, Postscript and DVI filters generate page breaks.
2216
2379
2217
     ----------------------------------------------------------------------
2380
     ----------------------------------------------------------------------
2218
2381
2219
4.2. Field data processing
2382
4.2. Field data processing
2220
2383
...
...
2822
2985
2823
   Recoll indexing options are set inside text configuration files located in
2986
   Recoll indexing options are set inside text configuration files located in
2824
   a configuration directory. There can be several such directories, each of
2987
   a configuration directory. There can be several such directories, each of
2825
   which define the parameters for one index.
2988
   which define the parameters for one index.
2826
2989
2827
   The configuration files can be edited by hand or through the Indexing
2990
   The configuration files can be edited by hand or through the Index
2828
   configuration dialog (Preferences menu). The GUI tool will try to respect
2991
   configuration dialog (Preferences menu). The GUI tool will try to respect
2829
   your formatting and comments as much as possible, so it is quite possible
2992
   your formatting and comments as much as possible, so it is quite possible
2830
   to use both ways.
2993
   to use both ways.
2831
2994
2832
   The most accurate documentation for the configuration parameters is given
2995
   The most accurate documentation for the configuration parameters is given
...
...
3019
           want to index very big text files as it will both reduce memory
3182
           want to index very big text files as it will both reduce memory
3020
           usage at index time and help with loading data to the preview
3183
           usage at index time and help with loading data to the preview
3021
           window. A size of a few megabytes would seem reasonable (default:
3184
           window. A size of a few megabytes would seem reasonable (default:
3022
           1MB).
3185
           1MB).
3023
3186
3187
   membermaxkbs
3188
3189
           This defines the maximum size in kilobytes for an archive member
3190
           (zip, tar or rar at the moment). Bigger entries will be skipped.
3191
3024
   indexallfilenames
3192
   indexallfilenames
3025
3193
3026
           Recoll indexes file names in a special section of the database to
3194
           Recoll indexes file names in a special section of the database to
3027
           allow specific file names searches using wild cards. This
3195
           allow specific file names searches using wild cards. This
3028
           parameter decides if file name indexing is performed only for
3196
           parameter decides if file name indexing is performed only for
...
...
3056
3224
3057
   Changing some of these parameters will imply a full reindex. Also, when
3225
   Changing some of these parameters will imply a full reindex. Also, when
3058
   using multiple indexes, it may not make sense to search indexes that don't
3226
   using multiple indexes, it may not make sense to search indexes that don't
3059
   share the values for these parameters, because they usually affect both
3227
   share the values for these parameters, because they usually affect both
3060
   search and index operations.
3228
   search and index operations.
3229
3230
   indexStripChars
3231
3232
           Decide if we strip characters of diacritics and convert them to
3233
           lower-case before terms are indexed. If we don't, searches
3234
           sensitive to case and diacritics can be performed, but the index
3235
           will be bigger, and some marginal weirdness may sometimes occur.
3236
           The default is a stripped index (indexStripChars = 1) for now.
3237
           When using multiple indexes for a search, this parameter must be
3238
           defined identically for all. Changing the value implies an index
3239
           reset.
3240
3241
   maxTermExpand
3242
3243
           Maximum expansion count for a single term (e.g.: when using
3244
           wildcards). The default of 10000 is reasonable and will avoid
3245
           queries that appear frozen while the engine is walking the term
3246
           list.
3247
3248
   maxXapianClauses
3249
3250
           Maximum number of elementary clauses we can add to a single Xapian
3251
           query. In some cases, the result of term expansion can be
3252
           multiplicative, and we want to avoid using excessive memory. The
3253
           default of 100 000 should be both high enough in most cases and
3254
           compatible with current typical hardware configurations.
3061
3255
3062
   nonumbers
3256
   nonumbers
3063
3257
3064
           If this set to true, no terms will be generated for numbers. For
3258
           If this set to true, no terms will be generated for numbers. For
3065
           example "123", "1.5e6", 192.168.1.4, would not be indexed
3259
           example "123", "1.5e6", 192.168.1.4, would not be indexed
...
...
3198
3392
3199
     ----------------------------------------------------------------------
3393
     ----------------------------------------------------------------------
3200
3394
3201
    5.4.1.4. Miscellaneous parameters:
3395
    5.4.1.4. Miscellaneous parameters:
3202
3396
3397
   autodiacsens
3398
3399
           IF the index is not stripped, decide if we automatically trigger
3400
           diacritics sensitivity if the search term has accented characters
3401
           (not in unac_except_trans). Else you need to use the query
3402
           language and the D modifier to specify diacritics sensitivity.
3403
           Default is no.
3404
3405
   autocasesens
3406
3407
           IF the index is not stripped, decide if we automatically trigger
3408
           character case sensitivity if the search term has upper-case
3409
           characters in any but the first position. Else you need to use the
3410
           query language and the C modifier to specify character-case
3411
           sensitivity. Default is yes.
3412
3203
   loglevel,daemloglevel
3413
   loglevel,daemloglevel
3204
3414
3205
           Verbosity level for recoll and recollindex. A value of 4 lists
3415
           Verbosity level for recoll and recollindex. A value of 4 lists
3206
           quite a lot of debug/information messages. 2 only lists errors.
3416
           quite a lot of debug/information messages. 2 only lists errors.
3207
           The daemversion is specific to the indexing monitor daemon.
3417
           The daemversion is specific to the indexing monitor daemon.
...
...
3235
   monauxinterval
3445
   monauxinterval
3236
3446
3237
           Period (in seconds) at which the real time monitor will regenerate
3447
           Period (in seconds) at which the real time monitor will regenerate
3238
           the auxiliary databases (spelling, stemming) if needed. The
3448
           the auxiliary databases (spelling, stemming) if needed. The
3239
           default is one hour.
3449
           default is one hour.
3450
3451
   monioniceclass, monioniceclassdata
3452
3453
           These allow defining the ionice class and data used by the indexer
3454
           (default class 3, no data).
3240
3455
3241
   filtermaxseconds
3456
   filtermaxseconds
3242
3457
3243
           Maximum filter execution time, after which it is aborted. Some
3458
           Maximum filter execution time, after which it is aborted. Some
3244
           postscript programs just loop...
3459
           postscript programs just loop...
...
...
3280
3495
3281
           If this is set, the aspell dictionary generation is turned off.
3496
           If this is set, the aspell dictionary generation is turned off.
3282
           Useful for cases where you don't need the functionality or when it
3497
           Useful for cases where you don't need the functionality or when it
3283
           is unusable because aspell crashes during dictionary generation.
3498
           is unusable because aspell crashes during dictionary generation.
3284
3499
3500
   mhmboxquirks
3501
3502
           This allows definining location-related quirks for the mailbox
3503
           handler. Currently only the tbird flag is defined, and it should
3504
           be set for directories which hold Thunderbird data, as their
3505
           folder format is weird.
3506
3285
     ----------------------------------------------------------------------
3507
     ----------------------------------------------------------------------
3286
3508
3287
  5.4.2. The fields file
3509
  5.4.2. The fields file
3288
3510
3289
   This file contains information about dynamic fields handling in Recoll.
3511
   This file contains information about dynamic fields handling in Recoll.
...
...
3392
   link in a result list. Ie: HTML is normally displayed using firefox, but
3614
   link in a result list. Ie: HTML is normally displayed using firefox, but
3393
   you may prefer Konqueror, your openoffice.org program might be named
3615
   you may prefer Konqueror, your openoffice.org program might be named
3394
   oofice instead of openoffice etc.
3616
   oofice instead of openoffice etc.
3395
3617
3396
   Changes to this file can be done by direct editing, or through the recoll
3618
   Changes to this file can be done by direct editing, or through the recoll
3397
   user preferences dialog.
3619
   GUI preferences dialog.
3398
3620
3399
   If Use desktop preferences to choose document editor is checked in the
3621
   If Use desktop preferences to choose document editor is checked in the
3400
   Recoll GUI user preferences, all mimeview entries will be ignored except
3622
   Recoll GUI preferences, all mimeview entries will be ignored except the
3401
   the one labelled application/x-all (which is set to use xdg-open by
3623
   one labelled application/x-all (which is set to use xdg-open by default).
3402
   default).
3624
3625
   In this case, the xallexcepts top level variable defines a list of mime
3626
   type exceptions which will be processed according to the local entries
3627
   instead of being passed to the desktop. This is so that specific Recoll
3628
   options such as a page number or a search string can be passed to
3629
   applications that support them, such as the evince viewer.
3403
3630
3404
   As for the other configuration files, the normal usage is to have a
3631
   As for the other configuration files, the normal usage is to have a
3405
   mimeview inside your own configuration directory, with just the
3632
   mimeview inside your own configuration directory, with just the
3406
   non-default entries, which will override those from the central
3633
   non-default entries, which will override those from the central
3407
   configuration file.
3634
   configuration file.
3408
3635
3409
   Please note that these entries must be placed under a [view] section.
3636
   All viewer definition entries must be placed under a [view] section.
3410
3637
3411
   The keys in the file are normally mime types. You can add an application
3638
   The keys in the file are normally mime types. You can add an application
3412
   tag to specialize the choice for an area of the filesystem (using a
3639
   tag to specialize the choice for an area of the filesystem (using a
3413
   localfields specification in mimeconf). The syntax for the key is
3640
   localfields specification in mimeconf). The syntax for the key is
3414
   mimetype|tag
3641
   mimetype|tag
...
...
3433
       on the container type. If this appears in the command line, Recoll
3660
       on the container type. If this appears in the command line, Recoll
3434
       will not create a temporary file to extract the subdocument, expecting
3661
       will not create a temporary file to extract the subdocument, expecting
3435
       the called application (possibly a script) to be able to handle it.
3662
       the called application (possibly a script) to be able to handle it.
3436
3663
3437
     * %M. Mime type
3664
     * %M. Mime type
3665
3666
     * %p. Page index. Only significant for a subset of document types,
3667
       currently only PDF, Postscript and DVI files. Can be used to start the
3668
       editor at the right page for a match or snippet.
3669
3670
     * %s. Search term. The value will only be set for documents with indexed
3671
       page numbers (ie: PDF). The value will be one of the matched search
3672
       terms. It would allow pre-setting the value in the "Find" entry inside
3673
       Evince for example, for easy highlighting of the term.
3438
3674
3439
     * %U, %u. Url.
3675
     * %U, %u. Url.
3440
3676
3441
   In addition to the predefined values above, all strings like %(fieldname)
3677
   In addition to the predefined values above, all strings like %(fieldname)
3442
   will be replaced by the value of the field named fieldname for the
3678
   will be replaced by the value of the field named fieldname for the