Switch to unified view

a/src/README b/src/README
...
...
6
6
7
  Jean-Francois Dockes
7
  Jean-Francois Dockes
8
8
9
   <jfd@recoll.org>
9
   <jfd@recoll.org>
10
10
11
   Copyright (c) 2005-2013 Jean-Francois Dockes
11
   Copyright (c) 2005-2014 Jean-Francois Dockes
12
12
13
   Permission is granted to copy, distribute and/or modify this document
13
   Permission is granted to copy, distribute and/or modify this document
14
   under the terms of the GNU Free Documentation License, Version 1.3 or any
14
   under the terms of the GNU Free Documentation License, Version 1.3 or any
15
   later version published by the Free Software Foundation; with no Invariant
15
   later version published by the Free Software Foundation; with no Invariant
16
   Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the
16
   Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the
17
   license can be found at the following location: GNU web site.
17
   license can be found at the following location: GNU web site.
18
18
19
   This document introduces full text search notions and describes the
19
   This document introduces full text search notions and describes the
20
   installation and use of the Recoll application. It currently describes
20
   installation and use of the Recoll application. It currently describes
21
   Recoll 1.19.
21
   Recoll 1.20.
22
22
23
     ----------------------------------------------------------------------
23
     ----------------------------------------------------------------------
24
24
25
   Table of Contents
25
   Table of Contents
26
26
...
...
186
186
187
                             5.4.6. The ptrans file
187
                             5.4.6. The ptrans file
188
188
189
                             5.4.7. Examples of configuration adjustments
189
                             5.4.7. Examples of configuration adjustments
190
190
191
Chapter 1. Introduction
191
                            Chapter 1. Introduction
192
192
193
1.1. Giving it a try
193
1.1. Giving it a try
194
194
195
   If you do not like reading manuals (who does?) and would like to give
195
   If you do not like reading manuals (who does?) and would like to give
196
   Recoll a try, just install the application and start the recoll graphical
196
   Recoll a try, just install the application and start the recoll graphical
...
...
319
   options to help you find what you are looking for. However, there are
319
   options to help you find what you are looking for. However, there are
320
   other ways to perform Recoll searches: mostly a command line interface, a
320
   other ways to perform Recoll searches: mostly a command line interface, a
321
   Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
321
   Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
322
   Lens module.
322
   Lens module.
323
323
324
Chapter 2. Indexing
324
                              Chapter 2. Indexing
325
325
326
2.1. Introduction
326
2.1. Introduction
327
327
328
   Indexing is the process by which the set of documents is analyzed and the
328
   Indexing is the process by which the set of documents is analyzed and the
329
   data entered into the database. Recoll indexing is normally incremental:
329
   data entered into the database. Recoll indexing is normally incremental:
...
...
337
337
338
  2.1.1. Indexing modes
338
  2.1.1. Indexing modes
339
339
340
   Recoll indexing can be performed along two different modes:
340
   Recoll indexing can be performed along two different modes:
341
341
342
     o Periodic (or batch) indexing: indexing takes place at discrete times,
342
     * Periodic (or batch) indexing: indexing takes place at discrete times,
343
       by executing the recollindex command. The typical usage is to have a
343
       by executing the recollindex command. The typical usage is to have a
344
       nightly indexing run programmed into your cron file.
344
       nightly indexing run programmed into your cron file.
345
345
346
     o Real time indexing: indexing takes place as soon as a file is created
346
     * Real time indexing: indexing takes place as soon as a file is created
347
       or changed. recollindex runs as a daemon and uses a file system
347
       or changed. recollindex runs as a daemon and uses a file system
348
       alteration monitor such as inotify, Fam or Gamin to detect file
348
       alteration monitor such as inotify, Fam or Gamin to detect file
349
       changes.
349
       changes.
350
350
351
   The choice between the two methods is mostly a matter of preference, and
351
   The choice between the two methods is mostly a matter of preference, and
...
...
455
455
456
   The default location for the index data is the xapiandb subdirectory of
456
   The default location for the index data is the xapiandb subdirectory of
457
   the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
457
   the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
458
   This can be changed via two different methods (with different purposes):
458
   This can be changed via two different methods (with different purposes):
459
459
460
     o You can specify a different configuration directory by setting the
460
     * You can specify a different configuration directory by setting the
461
       RECOLL_CONFDIR environment variable, or using the -c option to the
461
       RECOLL_CONFDIR environment variable, or using the -c option to the
462
       Recoll commands. This method would typically be used to index
462
       Recoll commands. This method would typically be used to index
463
       different areas of the file system to different indexes. For example,
463
       different areas of the file system to different indexes. For example,
464
       if you were to issue the following commands:
464
       if you were to issue the following commands:
465
465
...
...
473
473
474
       Using multiple configuration directories and configuration options
474
       Using multiple configuration directories and configuration options
475
       allows you to tailor multiple configurations and indexes to handle
475
       allows you to tailor multiple configurations and indexes to handle
476
       whatever subset of the available data you wish to make searchable.
476
       whatever subset of the available data you wish to make searchable.
477
477
478
     o For a given configuration directory, you can specify a non-default
478
     * For a given configuration directory, you can specify a non-default
479
       storage location for the index by setting the dbdir parameter in the
479
       storage location for the index by setting the dbdir parameter in the
480
       configuration file (see the configuration section). This method would
480
       configuration file (see the configuration section). This method would
481
       mainly be of use if you wanted to keep the configuration directory in
481
       mainly be of use if you wanted to keep the configuration directory in
482
       its default location, but desired another location for the index,
482
       its default location, but desired another location for the index,
483
       typically out of disk occupation concerns.
483
       typically out of disk occupation concerns.
...
...
896
896
897
   Recoll provides a configuration option to specify the minimum time before
897
   Recoll provides a configuration option to specify the minimum time before
898
   which a file, specified by a wildcard pattern, cannot be reindexed. See
898
   which a file, specified by a wildcard pattern, cannot be reindexed. See
899
   the mondelaypatterns parameter in the configuration section.
899
   the mondelaypatterns parameter in the configuration section.
900
900
901
Chapter 3. Searching
901
                              Chapter 3. Searching
902
902
903
3.1. Searching with the Qt graphical user interface
903
3.1. Searching with the Qt graphical user interface
904
904
905
   The recoll program provides the main user interface for searching. It is
905
   The recoll program provides the main user interface for searching. It is
906
   based on the Qt library.
906
   based on the Qt library.
907
907
908
   recoll has two search modes:
908
   recoll has two search modes:
909
909
910
     o Simple search (the default, on the main screen) has a single entry
910
     * Simple search (the default, on the main screen) has a single entry
911
       field where you can enter multiple words.
911
       field where you can enter multiple words.
912
912
913
     o Advanced search (a panel accessed through the Tools menu or the
913
     * Advanced search (a panel accessed through the Tools menu or the
914
       toolbox bar icon) has multiple entry fields, which you may use to
914
       toolbox bar icon) has multiple entry fields, which you may use to
915
       build a logical condition, with additional filtering on file type,
915
       build a logical condition, with additional filtering on file type,
916
       location in the file system, modification date, and size.
916
       location in the file system, modification date, and size.
917
917
918
   In most cases, you can enter the terms as you think them, even if they
918
   In most cases, you can enter the terms as you think them, even if they
...
...
952
   File name will specifically look for file names. The point of having a
952
   File name will specifically look for file names. The point of having a
953
   separate file name search is that wild card expansion can be performed
953
   separate file name search is that wild card expansion can be performed
954
   more efficiently on a small subset of the index (allowing wild cards on
954
   more efficiently on a small subset of the index (allowing wild cards on
955
   the left of terms without excessive penality). Things to know:
955
   the left of terms without excessive penality). Things to know:
956
956
957
     o White space in the entry should match white space in the file name,
957
     * White space in the entry should match white space in the file name,
958
       and is not treated specially.
958
       and is not treated specially.
959
959
960
     o The search is insensitive to character case and accents, independantly
960
     * The search is insensitive to character case and accents, independantly
961
       of the type of index.
961
       of the type of index.
962
962
963
     o An entry without any wild card character and not capitalized will be
963
     * An entry without any wild card character and not capitalized will be
964
       prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
964
       prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
965
965
966
     o If you have a big index (many files), excessively generic fragments
966
     * If you have a big index (many files), excessively generic fragments
967
       may result in inefficient searches.
967
       may result in inefficient searches.
968
968
969
   You can search for exact phrases (adjacent words in a given order) by
969
   You can search for exact phrases (adjacent words in a given order) by
970
   enclosing the input inside double quotes. Ex: "virtual reality".
970
   enclosing the input inside double quotes. Ex: "virtual reality".
971
971
...
...
1032
   standard desktop tool.
1032
   standard desktop tool.
1033
1033
1034
   You may also change the choice of applications by editing the mimeview
1034
   You may also change the choice of applications by editing the mimeview
1035
   configuration file if you find this more convenient.
1035
   configuration file if you find this more convenient.
1036
1036
1037
   Each result entry also has a right-click menu with an Open With entry.
1038
   This lets you choose an application from the list of those which
1039
   registered with the desktop for the document MIME type.
1040
1037
   The Preview and Open edit links may not be present for all entries,
1041
   The Preview and Open edit links may not be present for all entries,
1038
   meaning that Recoll has no configured way to preview a given file type
1042
   meaning that Recoll has no configured way to preview a given file type
1039
   (which was indexed by name only), or no configured external editor for the
1043
   (which was indexed by name only), or no configured external editor for the
1040
   file type. This can sometimes be adjusted simply by tweaking the mimemap
1044
   file type. This can sometimes be adjusted simply by tweaking the mimemap
1041
   and mimeview configuration files (the latter can be modified with the user
1045
   and mimeview configuration files (the latter can be modified with the user
...
...
1069
1073
1070
   Apart from the preview and edit links, you can display a pop-up menu by
1074
   Apart from the preview and edit links, you can display a pop-up menu by
1071
   right-clicking over a paragraph in the result list. This menu has the
1075
   right-clicking over a paragraph in the result list. This menu has the
1072
   following entries:
1076
   following entries:
1073
1077
1074
     o Preview
1078
     * Preview
1075
1079
1076
     o Open
1080
     * Open
1077
1081
1078
     o Copy File Name
1082
     * Copy File Name
1079
1083
1080
     o Copy Url
1084
     * Copy Url
1081
1085
1082
     o Save to File
1086
     * Save to File
1083
1087
1084
     o Find similar
1088
     * Find similar
1085
1089
1086
     o Preview Parent document
1090
     * Preview Parent document
1087
1091
1088
     o Open Parent document
1092
     * Open Parent document
1089
1093
1090
     o Open Snippets Window
1094
     * Open Snippets Window
1091
1095
1092
   The Preview and Open entries do the same thing as the corresponding links.
1096
   The Preview and Open entries do the same thing as the corresponding links.
1093
1097
1094
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
1098
   The Copy File Name and Copy Url copy the relevant data to the clipboard,
1095
   for later pasting.
1099
   for later pasting.
...
...
1256
1260
1257
   This part of the dialog lets you constructc a query by combining multiple
1261
   This part of the dialog lets you constructc a query by combining multiple
1258
   clauses of different types. Each entry field is configurable for the
1262
   clauses of different types. Each entry field is configurable for the
1259
   following modes:
1263
   following modes:
1260
1264
1261
     o All terms.
1265
     * All terms.
1262
1266
1263
     o Any term.
1267
     * Any term.
1264
1268
1265
     o None of the terms.
1269
     * None of the terms.
1266
1270
1267
     o Phrase (exact terms in order within an adjustable window).
1271
     * Phrase (exact terms in order within an adjustable window).
1268
1272
1269
     o Proximity (terms in any order within an adjustable window).
1273
     * Proximity (terms in any order within an adjustable window).
1270
1274
1271
     o Filename search.
1275
     * Filename search.
1272
1276
1273
   Additional entry fields can be created by clicking the Add clause button.
1277
   Additional entry fields can be created by clicking the Add clause button.
1274
1278
1275
   When searching, the non-empty clauses will be combined either with an AND
1279
   When searching, the non-empty clauses will be combined either with an AND
1276
   or an OR conjunction, depending on the choice made on the left (All
1280
   or an OR conjunction, depending on the choice made on the left (All
...
...
1295
    3.1.6.2. Avanced search: the "filter" tab
1299
    3.1.6.2. Avanced search: the "filter" tab
1296
1300
1297
   This part of the dialog has several sections which allow filtering the
1301
   This part of the dialog has several sections which allow filtering the
1298
   results of a search according to a number of criteria
1302
   results of a search according to a number of criteria
1299
1303
1300
     o The first section allows filtering by dates of last modification. You
1304
     * The first section allows filtering by dates of last modification. You
1301
       can specify both a minimum and a maximum date. The initial values are
1305
       can specify both a minimum and a maximum date. The initial values are
1302
       set according to the oldest and newest documents found in the index.
1306
       set according to the oldest and newest documents found in the index.
1303
1307
1304
     o The next section allows filtering the results by file size. There are
1308
     * The next section allows filtering the results by file size. There are
1305
       two entries for minimum and maximum size. Enter decimal numbers. You
1309
       two entries for minimum and maximum size. Enter decimal numbers. You
1306
       can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
1310
       can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
1307
       respectively.
1311
       respectively.
1308
1312
1309
     o The next section allows filtering the results by their MIME types, or
1313
     * The next section allows filtering the results by their MIME types, or
1310
       MIME categories (ie: media/text/message/etc.).
1314
       MIME categories (ie: media/text/message/etc.).
1311
1315
1312
       You can transfer the types between two boxes, to define which will be
1316
       You can transfer the types between two boxes, to define which will be
1313
       included or excluded by the search.
1317
       included or excluded by the search.
1314
1318
1315
       The state of the file type selection can be saved as the default (the
1319
       The state of the file type selection can be saved as the default (the
1316
       file type filter will not be activated at program start-up, but the
1320
       file type filter will not be activated at program start-up, but the
1317
       lists will be in the restored state).
1321
       lists will be in the restored state).
1318
1322
1319
     o The bottom section allows restricting the search results to a sub-tree
1323
     * The bottom section allows restricting the search results to a sub-tree
1320
       of the indexed area. You can use the Invert checkbox to search for
1324
       of the indexed area. You can use the Invert checkbox to search for
1321
       files not in the sub-tree instead. If you use directory filtering
1325
       files not in the sub-tree instead. If you use directory filtering
1322
       often and on big subsets of the file system, you may think of setting
1326
       often and on big subsets of the file system, you may think of setting
1323
       up multiple indexes instead, as the performance may be better.
1327
       up multiple indexes instead, as the performance may be better.
1324
1328
...
...
1553
   which will let you adjust what columns are displayed. You can drag the
1557
   which will let you adjust what columns are displayed. You can drag the
1554
   column headers to adjust their order. You can click them to sort by the
1558
   column headers to adjust their order. You can click them to sort by the
1555
   field displayed in the column. You can also save the result list in CSV
1559
   field displayed in the column. You can also save the result list in CSV
1556
   format.
1560
   format.
1557
1561
1562
   Changing the GUI geometry. It is possible to configure the GUI in wide
1563
   form factor by dragging the toolbars to one of the sides (their location
1564
   is remembered between sessions), and moving the category filters to a menu
1565
   (can be set in the Preferences -> GUI configuration -> User interface
1566
   panel).
1567
1558
   Query explanation. You can get an exact description of what the query
1568
   Query explanation. You can get an exact description of what the query
1559
   looked for, including stem expansion, and Boolean operators used, by
1569
   looked for, including stem expansion, and Boolean operators used, by
1560
   clicking on the result list header.
1570
   clicking on the result list header.
1561
1571
1562
   Advanced search history. As of Recoll 1.18, you can display any of the
1572
   Advanced search history. As of Recoll 1.18, you can display any of the
...
...
1599
   the parameters used for searching and returning results, and what indexes
1609
   the parameters used for searching and returning results, and what indexes
1600
   are searched.
1610
   are searched.
1601
1611
1602
   User interface parameters: 
1612
   User interface parameters: 
1603
1613
1604
     o Highlight color for query terms: Terms from the user query are
1614
     * Highlight color for query terms: Terms from the user query are
1605
       highlighted in the result list samples and the preview window. The
1615
       highlighted in the result list samples and the preview window. The
1606
       color can be chosen here. Any Qt color string should work (ie red,
1616
       color can be chosen here. Any Qt color string should work (ie red,
1607
       #ff0000). The default is blue.
1617
       #ff0000). The default is blue.
1608
1618
1609
     o Style sheet: The name of a Qt style sheet text file which is applied
1619
     * Style sheet: The name of a Qt style sheet text file which is applied
1610
       to the whole Recoll application on startup. The default value is
1620
       to the whole Recoll application on startup. The default value is
1611
       empty, but there is a skeleton style sheet (recoll.qss) inside the
1621
       empty, but there is a skeleton style sheet (recoll.qss) inside the
1612
       /usr/share/recoll/examples directory. Using a style sheet, you can
1622
       /usr/share/recoll/examples directory. Using a style sheet, you can
1613
       change most recoll graphical parameters: colors, fonts, etc. See the
1623
       change most recoll graphical parameters: colors, fonts, etc. See the
1614
       sample file for a few simple examples.
1624
       sample file for a few simple examples.
...
...
1619
       set the foreground to a light color and the background to a dark one
1629
       set the foreground to a light color and the background to a dark one
1620
       in the desktop preferences, but only the background is set inside the
1630
       in the desktop preferences, but only the background is set inside the
1621
       Recoll style sheet, and it is light too, then text will appear
1631
       Recoll style sheet, and it is light too, then text will appear
1622
       light-on-light inside the Recoll GUI.
1632
       light-on-light inside the Recoll GUI.
1623
1633
1624
     o Maximum text size highlighted for preview Inserting highlights on
1634
     * Maximum text size highlighted for preview Inserting highlights on
1625
       search term inside the text before inserting it in the preview window
1635
       search term inside the text before inserting it in the preview window
1626
       involves quite a lot of processing, and can be disabled over the given
1636
       involves quite a lot of processing, and can be disabled over the given
1627
       text size to speed up loading.
1637
       text size to speed up loading.
1628
1638
1629
     o Prefer HTML to plain text for preview if set, Recoll will display HTML
1639
     * Prefer HTML to plain text for preview if set, Recoll will display HTML
1630
       as such inside the preview window. If this causes problems with the Qt
1640
       as such inside the preview window. If this causes problems with the Qt
1631
       HTML display, you can uncheck it to display the plain text version
1641
       HTML display, you can uncheck it to display the plain text version
1632
       instead.
1642
       instead.
1633
1643
1634
     o Plain text to HTML line style: when displaying plain text inside the
1644
     * Plain text to HTML line style: when displaying plain text inside the
1635
       preview window, Recoll tries to preserve some of the original text
1645
       preview window, Recoll tries to preserve some of the original text
1636
       line breaks and indentation. It can either use PRE HTML tags, which
1646
       line breaks and indentation. It can either use PRE HTML tags, which
1637
       will well preserve the indentation but will force horizontal scrolling
1647
       will well preserve the indentation but will force horizontal scrolling
1638
       for long lines, or use BR tags to break at the original line breaks,
1648
       for long lines, or use BR tags to break at the original line breaks,
1639
       which will let the editor introduce other line breaks according to the
1649
       which will let the editor introduce other line breaks according to the
1640
       window width, but will lose some of the original indentation. The
1650
       window width, but will lose some of the original indentation. The
1641
       third option has been available in recent releases and is probably now
1651
       third option has been available in recent releases and is probably now
1642
       the best one: use PRE tags with line wrapping.
1652
       the best one: use PRE tags with line wrapping.
1643
1653
1644
     o Use desktop preferences to choose document editor: if this is checked,
1654
     * Use desktop preferences to choose document editor: if this is checked,
1645
       the xdg-open utility will be used to open files when you click the
1655
       the xdg-open utility will be used to open files when you click the
1646
       Open link in the result list, instead of the application defined in
1656
       Open link in the result list, instead of the application defined in
1647
       mimeview. xdg-open will in term use your desktop preferences to choose
1657
       mimeview. xdg-open will in term use your desktop preferences to choose
1648
       an appropriate application.
1658
       an appropriate application.
1649
1659
1650
     o Exceptions: when using the desktop preferences for opening documents,
1660
     * Exceptions: when using the desktop preferences for opening documents,
1651
       these are MIME types that will still be opened according to Recoll
1661
       these are MIME types that will still be opened according to Recoll
1652
       preferences. This is useful for passing parameters like page numbers
1662
       preferences. This is useful for passing parameters like page numbers
1653
       or search strings to applications that support them (e.g. evince).
1663
       or search strings to applications that support them (e.g. evince).
1654
       This cannot be done with xdg-open which only supports passing one
1664
       This cannot be done with xdg-open which only supports passing one
1655
       parameter.
1665
       parameter.
1656
1666
1657
     o Choose editor applications this will let you choose the command
1667
     * Choose editor applications this will let you choose the command
1658
       started by the Open links inside the result list, for specific
1668
       started by the Open links inside the result list, for specific
1659
       document types.
1669
       document types.
1660
1670
1661
     o Display category filter as toolbar... this will let you choose if the
1671
     * Display category filter as toolbar... this will let you choose if the
1662
       document categories are displayed as a list or a set of buttons.
1672
       document categories are displayed as a list or a set of buttons.
1663
1673
1664
     o Auto-start simple search on white space entry: if this is checked, a
1674
     * Auto-start simple search on white space entry: if this is checked, a
1665
       search will be executed each time you enter a space in the simple
1675
       search will be executed each time you enter a space in the simple
1666
       search input field. This lets you look at the result list as you enter
1676
       search input field. This lets you look at the result list as you enter
1667
       new terms. This is off by default, you may like it or not...
1677
       new terms. This is off by default, you may like it or not...
1668
1678
1669
     o Start with advanced search dialog open : If you use this dialog
1679
     * Start with advanced search dialog open : If you use this dialog
1670
       frequently, checking the entries will get it to open when recoll
1680
       frequently, checking the entries will get it to open when recoll
1671
       starts.
1681
       starts.
1672
1682
1673
     o Remember sort activation state if set, Recoll will remember the sort
1683
     * Remember sort activation state if set, Recoll will remember the sort
1674
       tool stat between invocations. It normally starts with sorting
1684
       tool stat between invocations. It normally starts with sorting
1675
       disabled.
1685
       disabled.
1676
1686
1677
   Result list parameters: 
1687
   Result list parameters: 
1678
1688
1679
     o Number of results in a result page
1689
     * Number of results in a result page
1680
1690
1681
     o Result list font: There is quite a lot of information shown in the
1691
     * Result list font: There is quite a lot of information shown in the
1682
       result list, and you may want to customize the font and/or font size.
1692
       result list, and you may want to customize the font and/or font size.
1683
       The rest of the fonts used by Recoll are determined by your generic Qt
1693
       The rest of the fonts used by Recoll are determined by your generic Qt
1684
       config (try the qtconfig command).
1694
       config (try the qtconfig command).
1685
1695
1686
     o Edit result list paragraph format string: allows you to change the
1696
     * Edit result list paragraph format string: allows you to change the
1687
       presentation of each result list entry. See the result list
1697
       presentation of each result list entry. See the result list
1688
       customisation section.
1698
       customisation section.
1689
1699
1690
     o Edit result page HTML header insert: allows you to define text
1700
     * Edit result page HTML header insert: allows you to define text
1691
       inserted at the end of the result page HTML header. More detail in the
1701
       inserted at the end of the result page HTML header. More detail in the
1692
       result list customisation section.
1702
       result list customisation section.
1693
1703
1694
     o Date format: allows specifying the format used for displaying dates
1704
     * Date format: allows specifying the format used for displaying dates
1695
       inside the result list. This should be specified as an strftime()
1705
       inside the result list. This should be specified as an strftime()
1696
       string (man strftime).
1706
       string (man strftime).
1697
1707
1698
     o Abstract snippet separator: for synthetic abstracts built from index
1708
     * Abstract snippet separator: for synthetic abstracts built from index
1699
       data, which are usually made of several snippets from different parts
1709
       data, which are usually made of several snippets from different parts
1700
       of the document, this defines the snippet separator, an ellipsis by
1710
       of the document, this defines the snippet separator, an ellipsis by
1701
       default.
1711
       default.
1702
1712
1703
   Search parameters: 
1713
   Search parameters: 
1704
1714
1705
     o Hide duplicate results: decides if result list entries are shown for
1715
     * Hide duplicate results: decides if result list entries are shown for
1706
       identical documents found in different places.
1716
       identical documents found in different places.
1707
1717
1708
     o Stemming language: stemming obviously depends on the document's
1718
     * Stemming language: stemming obviously depends on the document's
1709
       language. This listbox will let you chose among the stemming databases
1719
       language. This listbox will let you chose among the stemming databases
1710
       which were built during indexing (this is set in the main
1720
       which were built during indexing (this is set in the main
1711
       configuration file), or later added with recollindex -s (See the
1721
       configuration file), or later added with recollindex -s (See the
1712
       recollindex manual). Stemming languages which are dynamically added
1722
       recollindex manual). Stemming languages which are dynamically added
1713
       will be deleted at the next indexing pass unless they are also added
1723
       will be deleted at the next indexing pass unless they are also added
1714
       in the configuration file.
1724
       in the configuration file.
1715
1725
1716
     o Automatically add phrase to simple searches: a phrase will be
1726
     * Automatically add phrase to simple searches: a phrase will be
1717
       automatically built and added to simple searches when looking for Any
1727
       automatically built and added to simple searches when looking for Any
1718
       terms. This will give a relevance boost to the results where the
1728
       terms. This will give a relevance boost to the results where the
1719
       search terms appear as a phrase (consecutive and in order).
1729
       search terms appear as a phrase (consecutive and in order).
1720
1730
1721
     o Autophrase term frequency threshold percentage: very frequent terms
1731
     * Autophrase term frequency threshold percentage: very frequent terms
1722
       should not be included in automatic phrase searches for performance
1732
       should not be included in automatic phrase searches for performance
1723
       reasons. The parameter defines the cutoff percentage (percentage of
1733
       reasons. The parameter defines the cutoff percentage (percentage of
1724
       the documents where the term appears).
1734
       the documents where the term appears).
1725
1735
1726
     o Replace abstracts from documents: this decides if we should synthesize
1736
     * Replace abstracts from documents: this decides if we should synthesize
1727
       and display an abstract in place of an explicit abstract found within
1737
       and display an abstract in place of an explicit abstract found within
1728
       the document itself.
1738
       the document itself.
1729
1739
1730
     o Dynamically build abstracts: this decides if Recoll tries to build
1740
     * Dynamically build abstracts: this decides if Recoll tries to build
1731
       document abstracts (lists of snippets) when displaying the result
1741
       document abstracts (lists of snippets) when displaying the result
1732
       list. Abstracts are constructed by taking context from the document
1742
       list. Abstracts are constructed by taking context from the document
1733
       information, around the search terms.
1743
       information, around the search terms.
1734
1744
1735
     o Synthetic abstract size: adjust to taste...
1745
     * Synthetic abstract size: adjust to taste...
1736
1746
1737
     o Synthetic abstract context words: how many words should be displayed
1747
     * Synthetic abstract context words: how many words should be displayed
1738
       around each term occurrence.
1748
       around each term occurrence.
1739
1749
1740
     o Query language magic file name suffixes: a list of words which
1750
     * Query language magic file name suffixes: a list of words which
1741
       automatically get turned into ext:xxx file name suffix clauses when
1751
       automatically get turned into ext:xxx file name suffix clauses when
1742
       starting a query language query (ie: doc xls xlsx...). This will save
1752
       starting a query language query (ie: doc xls xlsx...). This will save
1743
       some typing for people who use file types a lot when querying.
1753
       some typing for people who use file types a lot when querying.
1744
1754
1745
   External indexes: This panel will let you browse for additional indexes
1755
   External indexes: This panel will let you browse for additional indexes
...
...
1760
    3.1.12.1. The result list format
1770
    3.1.12.1. The result list format
1761
1771
1762
   The result list presentation can be exhaustively customized by adjusting
1772
   The result list presentation can be exhaustively customized by adjusting
1763
   two elements:
1773
   two elements:
1764
1774
1765
     o The paragraph format
1775
     * The paragraph format
1766
1776
1767
     o HTML code inside the header section
1777
     * HTML code inside the header section
1768
1778
1769
   These can be edited from the Result list tab of the GUI configuration.
1779
   These can be edited from the Result list tab of the GUI configuration.
1770
1780
1771
   Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
1781
   Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
1772
   (this may be disabled at build time), and total customisation is possible
1782
   (this may be disabled at build time), and total customisation is possible
...
...
1784
      The paragraph format
1794
      The paragraph format
1785
1795
1786
   This is an arbitrary HTML string where the following printf-like %
1796
   This is an arbitrary HTML string where the following printf-like %
1787
   substitutions will be performed:
1797
   substitutions will be performed:
1788
1798
1789
     o %A. Abstract
1799
     * %A. Abstract
1790
1800
1791
     o %D. Date
1801
     * %D. Date
1792
1802
1793
     o %I. Icon image name. This is normally determined from the MIME type.
1803
     * %I. Icon image name. This is normally determined from the MIME type.
1794
       The associations are defined inside the mimeconf configuration file.
1804
       The associations are defined inside the mimeconf configuration file.
1795
       If a thumbnail for the file is found at the standard Freedesktop
1805
       If a thumbnail for the file is found at the standard Freedesktop
1796
       location, this will be displayed instead.
1806
       location, this will be displayed instead.
1797
1807
1798
     o %K. Keywords (if any)
1808
     * %K. Keywords (if any)
1799
1809
1800
     o %L. Precooked Preview, Edit, and possibly Snippets links
1810
     * %L. Precooked Preview, Edit, and possibly Snippets links
1801
1811
1802
     o %M. MIME type
1812
     * %M. MIME type
1803
1813
1804
     o %N. result Number inside the result page
1814
     * %N. result Number inside the result page
1805
1815
1816
     * %P. Parent folder Url. In the case of an embedded document, this is
1817
       the parent folder for the top level container file.
1818
1806
     o %R. Relevance percentage
1819
     * %R. Relevance percentage
1807
1820
1808
     o %S. Size information
1821
     * %S. Size information
1809
1822
1810
     o %T. Title or Filename if not set.
1823
     * %T. Title or Filename if not set.
1811
1824
1812
     o %t. Title or Filename if not set.
1825
     * %t. Title or Filename if not set.
1813
1826
1814
     o %U. Url
1827
     * %U. Url
1815
1828
1816
   The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
1829
   The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
1817
   href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
1830
   href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
1818
   number inside the result page).
1831
   number inside the result page).
1832
1833
   It is also possible to use a "F%N" value as a link target. This will open
1834
   the document corresponding to the %P parent folder expansion, usually
1835
   creating a file manager window on the folder where the container file
1836
   resides. E.g.:
1837
1838
 <a href="F%N">%P</a>
1819
1839
1820
   In addition to the predefined values above, all strings like %(fieldname)
1840
   In addition to the predefined values above, all strings like %(fieldname)
1821
   will be replaced by the value of the field named fieldname for this
1841
   will be replaced by the value of the field named fieldname for this
1822
   document. Only stored fields can be accessed in this way, the value of
1842
   document. Only stored fields can be accessed in this way, the value of
1823
   indexed but not stored fields is not known at this point in the search
1843
   indexed but not stored fields is not known at this point in the search
...
...
1906
3.3. Searching on the command line
1926
3.3. Searching on the command line
1907
1927
1908
   There are several ways to obtain search results as a text stream, without
1928
   There are several ways to obtain search results as a text stream, without
1909
   a graphical interface:
1929
   a graphical interface:
1910
1930
1911
     o By passing option -t to the recoll program.
1931
     * By passing option -t to the recoll program.
1912
1932
1913
     o By using the recollq program.
1933
     * By using the recollq program.
1914
1934
1915
     o By writing a custom Python program, using the Recoll Python API.
1935
     * By writing a custom Python program, using the Recoll Python API.
1916
1936
1917
   The first two methods work in the same way and accept/need the same
1937
   The first two methods work in the same way and accept/need the same
1918
   arguments (except for the additional -t to recoll). The query to be
1938
   arguments (except for the additional -t to recoll). The query to be
1919
   executed is specified as command line arguments.
1939
   executed is specified as command line arguments.
1920
1940
...
...
1976
1996
1977
   In some cases, the document paths stored inside the index do not match the
1997
   In some cases, the document paths stored inside the index do not match the
1978
   actual ones, so that document previews and accesses will fail. This can
1998
   actual ones, so that document previews and accesses will fail. This can
1979
   occur in a number of circumstances:
1999
   occur in a number of circumstances:
1980
2000
1981
     o When using multiple indexes it is a relatively common occurrence that
2001
     * When using multiple indexes it is a relatively common occurrence that
1982
       some will actually reside on a remote volume, for exemple mounted via
2002
       some will actually reside on a remote volume, for exemple mounted via
1983
       NFS. In this case, the paths used to access the documents on the local
2003
       NFS. In this case, the paths used to access the documents on the local
1984
       machine are not necessarily the same than the ones used while indexing
2004
       machine are not necessarily the same than the ones used while indexing
1985
       on the remote machine. For example, /home/me may have been used as a
2005
       on the remote machine. For example, /home/me may have been used as a
1986
       topdirs elements while indexing, but the directory might be mounted as
2006
       topdirs elements while indexing, but the directory might be mounted as
1987
       /net/server/home/me on the local machine.
2007
       /net/server/home/me on the local machine.
1988
2008
1989
     o The case may also occur with removable disks. It is perfectly possible
2009
     * The case may also occur with removable disks. It is perfectly possible
1990
       to configure an index to live with the documents on the removable
2010
       to configure an index to live with the documents on the removable
1991
       disk, but it may happen that the disk is not mounted at the same place
2011
       disk, but it may happen that the disk is not mounted at the same place
1992
       so that the documents paths from the index are invalid.
2012
       so that the documents paths from the index are invalid.
1993
2013
1994
     o As a last exemple, one could imagine that a big directory has been
2014
     * As a last exemple, one could imagine that a big directory has been
1995
       moved, but that it is currently inconvenient to run the indexer.
2015
       moved, but that it is currently inconvenient to run the indexer.
1996
2016
1997
   More generally, the path translation facility may be useful whenever the
2017
   More generally, the path translation facility may be useful whenever the
1998
   documents paths seen by the indexer are not the same as the ones which
2018
   documents paths seen by the indexer are not the same as the ones which
1999
   should be used at query time.
2019
   should be used at query time.
...
...
2055
2075
2056
   As usual, words inside quotes define a phrase (the order of words is
2076
   As usual, words inside quotes define a phrase (the order of words is
2057
   significant), so that title:"prejudice pride" is not the same as
2077
   significant), so that title:"prejudice pride" is not the same as
2058
   title:prejudice title:pride, and is unlikely to find a result.
2078
   title:prejudice title:pride, and is unlikely to find a result.
2059
2079
2080
   To save you some typing, recent Recoll versions (1.20 and later) interpret
2081
   a comma-separated list of terms as an AND list inside the field. Use slash
2082
   characters ('/') for an OR list. No white space is allowed. So
2083
2084
 author:john,lennon
2085
2086
   will search for documents with john and lennon inside the author field (in
2087
   any order), and
2088
2089
 author:john/ringo
2090
2091
   would search for john or ringo.
2092
2060
   Modifiers can be set on a phrase clause, for example to specify a
2093
   Modifiers can be set on a phrase clause, for example to specify a
2061
   proximity search (unordered). See the modifier section.
2094
   proximity search (unordered). See the modifier section.
2062
2095
2063
   Recoll currently manages the following default fields:
2096
   Recoll currently manages the following default fields:
2064
2097
2065
     o title, subject or caption are synonyms which specify data to be
2098
     * title, subject or caption are synonyms which specify data to be
2066
       searched for in the document title or subject.
2099
       searched for in the document title or subject.
2067
2100
2068
     o author or from for searching the documents originators.
2101
     * author or from for searching the documents originators.
2069
2102
2070
     o recipient or to for searching the documents recipients.
2103
     * recipient or to for searching the documents recipients.
2071
2104
2072
     o keyword for searching the document-specified keywords (few documents
2105
     * keyword for searching the document-specified keywords (few documents
2073
       actually have any).
2106
       actually have any).
2074
2107
2075
     o filename for the document's file name.
2108
     * filename for the document's file name. This is not necessarily set for
2109
       all documents: internal documents contained inside a compound one (for
2110
       example an EPUB section) do not inherit the container file name any
2111
       more, this was replaced by an explicit field (see next). Sub-documents
2112
       can still have a specific filename, if it is implied by the document
2113
       format, for example the attachment file name for an email attachment.
2076
2114
2115
     * containerfilename. This is set for all documents, both top-level and
2116
       contained sub-documents, and is always the name of the filesystem
2117
       directory entry which contains the data. The terms from this field can
2118
       only be matched by an explicit field specification (as opposed to
2119
       terms from filename which are also indexed as general document
2120
       content). This avoids getting matches for all the sub-documents when
2121
       searching for the container file name.
2122
2077
     o ext specifies the file name extension (Ex: ext:html)
2123
     * ext specifies the file name extension (Ex: ext:html)
2124
2125
   Recoll 1.20 and later have a way to specify aliases for the field names,
2126
   which will save typing, for example by aliasing filename to fn or
2127
   containerfilename to cfn. See the section about the fields file
2078
2128
2079
   The field syntax also supports a few field-like, but special, criteria:
2129
   The field syntax also supports a few field-like, but special, criteria:
2080
2130
2081
     o dir for filtering the results on file location (Ex:
2131
     * dir for filtering the results on file location (Ex:
2082
       dir:/home/me/somedir). -dir also works to find results not in the
2132
       dir:/home/me/somedir). -dir also works to find results not in the
2083
       specified directory (release >= 1.15.8). Tilde expansion will be
2133
       specified directory (release >= 1.15.8). Tilde expansion will be
2084
       performed as usual (except for a bug in versions 1.19 to 1.19.11p1).
2134
       performed as usual (except for a bug in versions 1.19 to 1.19.11p1).
2085
       Wildcards will be expanded, but please have a look at an important
2135
       Wildcards will be expanded, but please have a look at an important
2086
       limitation of wildcards in path filters.
2136
       limitation of wildcards in path filters.
...
...
2108
       and are best avoided.
2158
       and are best avoided.
2109
2159
2110
       You need to use double-quotes around the path value if it contains
2160
       You need to use double-quotes around the path value if it contains
2111
       space characters.
2161
       space characters.
2112
2162
2113
     o size for filtering the results on file size. Example: size<10000. You
2163
     * size for filtering the results on file size. Example: size<10000. You
2114
       can use <, > or = as operators. You can specify a range like the
2164
       can use <, > or = as operators. You can specify a range like the
2115
       following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
2165
       following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
2116
       used as (decimal) multipliers. Ex: size>1k to search for files bigger
2166
       used as (decimal) multipliers. Ex: size>1k to search for files bigger
2117
       than 1000 bytes.
2167
       than 1000 bytes.
2118
2168
2119
     o date for searching or filtering on dates. The syntax for the argument
2169
     * date for searching or filtering on dates. The syntax for the argument
2120
       is based on the ISO8601 standard for dates and time intervals. Only
2170
       is based on the ISO8601 standard for dates and time intervals. Only
2121
       dates are supported, no times. The general syntax is 2 elements
2171
       dates are supported, no times. The general syntax is 2 elements
2122
       separated by a / character. Each element can be a date or a period of
2172
       separated by a / character. Each element can be a date or a period of
2123
       time. Periods are specified as PnYnMnD. The n numbers are the
2173
       time. Periods are specified as PnYnMnD. The n numbers are the
2124
       respective numbers of years, months or days, any of which may be
2174
       respective numbers of years, months or days, any of which may be
2125
       missing. Dates are specified as YYYY-MM-DD. The days and months parts
2175
       missing. Dates are specified as YYYY-MM-DD. The days and months parts
2126
       may be missing. If the / is present but an element is missing, the
2176
       may be missing. If the / is present but an element is missing, the
2127
       missing element is interpreted as the lowest or highest date in the
2177
       missing element is interpreted as the lowest or highest date in the
2128
       index. Examples:
2178
       index. Examples:
2129
2179
2130
          o 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
2180
          * 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
2131
2181
2132
          o 2001-03-01/P1Y2M the same specified with a period.
2182
          * 2001-03-01/P1Y2M the same specified with a period.
2133
2183
2134
          o 2001/ from the beginning of 2001 to the latest date in the index.
2184
          * 2001/ from the beginning of 2001 to the latest date in the index.
2135
2185
2136
          o 2001 the whole year of 2001
2186
          * 2001 the whole year of 2001
2137
2187
2138
          o P2D/ means 2 days ago up to now if there are no documents with
2188
          * P2D/ means 2 days ago up to now if there are no documents with
2139
            dates in the future.
2189
            dates in the future.
2140
2190
2141
          o /2003 all documents from 2003 or older.
2191
          * /2003 all documents from 2003 or older.
2142
2192
2143
       Periods can also be specified with small letters (ie: p2y).
2193
       Periods can also be specified with small letters (ie: p2y).
2144
2194
2145
     o mime or format for specifying the MIME type. This one is quite special
2195
     * mime or format for specifying the MIME type. This one is quite special
2146
       because you can specify several values which will be OR'ed (the normal
2196
       because you can specify several values which will be OR'ed (the normal
2147
       default for the language is AND). Ex: mime:text/plain mime:text/html.
2197
       default for the language is AND). Ex: mime:text/plain mime:text/html.
2148
       Specifying an explicit boolean operator before a mime specification is
2198
       Specifying an explicit boolean operator before a mime specification is
2149
       not supported and will produce strange results. You can filter out
2199
       not supported and will produce strange results. You can filter out
2150
       certain types by using negation (-mime:some/type), and you can use
2200
       certain types by using negation (-mime:some/type), and you can use
2151
       wildcards in the value (mime:text/*). Note that mime is the ONLY field
2201
       wildcards in the value (mime:text/*). Note that mime is the ONLY field
2152
       with an OR default. You do need to use OR with ext terms for example.
2202
       with an OR default. You do need to use OR with ext terms for example.
2153
2203
2154
     o type or rclcat for specifying the category (as in
2204
     * type or rclcat for specifying the category (as in
2155
       text/media/presentation/etc.). The classification of MIME types in
2205
       text/media/presentation/etc.). The classification of MIME types in
2156
       categories is defined in the Recoll configuration (mimeconf), and can
2206
       categories is defined in the Recoll configuration (mimeconf), and can
2157
       be modified or extended. The default category names are those which
2207
       be modified or extended. The default category names are those which
2158
       permit filtering results in the main GUI screen. Categories are OR'ed
2208
       permit filtering results in the main GUI screen. Categories are OR'ed
2159
       like MIME types above. This can't be negated with - either.
2209
       like MIME types above. This can't be negated with - either.
...
...
2174
   Some characters are recognized as search modifiers when found immediately
2224
   Some characters are recognized as search modifiers when found immediately
2175
   after the closing double quote of a phrase, as in "some
2225
   after the closing double quote of a phrase, as in "some
2176
   term"modifierchars. The actual "phrase" can be a single term of course.
2226
   term"modifierchars. The actual "phrase" can be a single term of course.
2177
   Supported modifiers:
2227
   Supported modifiers:
2178
2228
2179
     o l can be used to turn off stemming (mostly makes sense with p because
2229
     * l can be used to turn off stemming (mostly makes sense with p because
2180
       stemming is off by default for phrases).
2230
       stemming is off by default for phrases).
2181
2231
2182
     o o can be used to specify a "slack" for phrase and proximity searches:
2232
     * o can be used to specify a "slack" for phrase and proximity searches:
2183
       the number of additional terms that may be found between the specified
2233
       the number of additional terms that may be found between the specified
2184
       ones. If o is followed by an integer number, this is the slack, else
2234
       ones. If o is followed by an integer number, this is the slack, else
2185
       the default is 10.
2235
       the default is 10.
2186
2236
2187
     o p can be used to turn the default phrase search into a proximity one
2237
     * p can be used to turn the default phrase search into a proximity one
2188
       (unordered). Example:"order any in"p
2238
       (unordered). Example:"order any in"p
2189
2239
2190
     o C will turn on case sensitivity (if the index supports it).
2240
     * C will turn on case sensitivity (if the index supports it).
2191
2241
2192
     o D will turn on diacritics sensitivity (if the index supports it).
2242
     * D will turn on diacritics sensitivity (if the index supports it).
2193
2243
2194
     o A weight can be specified for a query element by specifying a decimal
2244
     * A weight can be specified for a query element by specifying a decimal
2195
       value at the start of the modifiers. Example: "Important"2.5.
2245
       value at the start of the modifiers. Example: "Important"2.5.
2196
2246
2197
3.6. Search case and diacritics sensitivity
2247
3.6. Search case and diacritics sensitivity
2198
2248
2199
   For Recoll versions 1.18 and later, and when working with a raw index (not
2249
   For Recoll versions 1.18 and later, and when working with a raw index (not
...
...
2257
   All words entered in Recoll search fields will be processed for wildcard
2307
   All words entered in Recoll search fields will be processed for wildcard
2258
   expansion before the request is finally executed.
2308
   expansion before the request is finally executed.
2259
2309
2260
   The wildcard characters are:
2310
   The wildcard characters are:
2261
2311
2262
     o * which matches 0 or more characters.
2312
     * * which matches 0 or more characters.
2263
2313
2264
     o ? which matches a single character.
2314
     * ? which matches a single character.
2265
2315
2266
     o [] which allow defining sets of characters to be matched (ex: [abc]
2316
     * [] which allow defining sets of characters to be matched (ex: [abc]
2267
       matches a single character which may be 'a' or 'b' or 'c', [0-9]
2317
       matches a single character which may be 'a' or 'b' or 'c', [0-9]
2268
       matches any number.
2318
       matches any number.
2269
2319
2270
   You should be aware of a few things when using wildcards.
2320
   You should be aware of a few things when using wildcards.
2271
2321
2272
     o Using a wildcard character at the beginning of a word can make for a
2322
     * Using a wildcard character at the beginning of a word can make for a
2273
       slow search because Recoll will have to scan the whole index term list
2323
       slow search because Recoll will have to scan the whole index term list
2274
       to find the matches. However, this is much less a problem for field
2324
       to find the matches. However, this is much less a problem for field
2275
       searches, and queries like author:*@domain.com can sometimes be very
2325
       searches, and queries like author:*@domain.com can sometimes be very
2276
       useful.
2326
       useful.
2277
2327
2278
     o For Recoll version 18 only, when working with a raw index (preserving
2328
     * For Recoll version 18 only, when working with a raw index (preserving
2279
       character case and diacritics), the literal part of a wildcard
2329
       character case and diacritics), the literal part of a wildcard
2280
       expression will be matched exactly for case and diacritics. This is
2330
       expression will be matched exactly for case and diacritics. This is
2281
       not true any more for versions 19 and later.
2331
       not true any more for versions 19 and later.
2282
2332
2283
     o Using a * at the end of a word can produce more matches than you would
2333
     * Using a * at the end of a word can produce more matches than you would
2284
       think, and strange search results. You can use the term explorer tool
2334
       think, and strange search results. You can use the term explorer tool
2285
       to check what completions exist for a given term. You can also see
2335
       to check what completions exist for a given term. You can also see
2286
       exactly what search was performed by clicking on the link at the top
2336
       exactly what search was performed by clicking on the link at the top
2287
       of the result list. In general, for natural language terms, stem
2337
       of the result list. In general, for natural language terms, stem
2288
       expansion will produce better results than an ending * (stem expansion
2338
       expansion will produce better results than an ending * (stem expansion
...
...
2335
3.8. Desktop integration
2385
3.8. Desktop integration
2336
2386
2337
   Being independant of the desktop type has its drawbacks: Recoll desktop
2387
   Being independant of the desktop type has its drawbacks: Recoll desktop
2338
   integration is minimal. However there are a few tools available:
2388
   integration is minimal. However there are a few tools available:
2339
2389
2340
     o The KDE KIO Slave was described in a previous section.
2390
     * The KDE KIO Slave was described in a previous section.
2341
2391
2342
     o If you use a recent version of Ubuntu Linux, you may find the Ubuntu
2392
     * If you use a recent version of Ubuntu Linux, you may find the Ubuntu
2343
       Unity Lens module useful.
2393
       Unity Lens module useful.
2344
2394
2345
     o There is also an independantly developed Krunner plugin.
2395
     * There is also an independantly developed Krunner plugin.
2346
2396
2347
   Here follow a few other things that may help.
2397
   Here follow a few other things that may help.
2348
2398
2349
  3.8.1. Hotkeying recoll
2399
  3.8.1. Hotkeying recoll
2350
2400
...
...
2374
   query (in query language form), and an icon which can be used to restrict
2424
   query (in query language form), and an icon which can be used to restrict
2375
   the search to certain types of files. It is quite primitive, and launches
2425
   the search to certain types of files. It is quite primitive, and launches
2376
   a new recoll GUI instance every time (even if it is already running). You
2426
   a new recoll GUI instance every time (even if it is already running). You
2377
   may find it useful anyway.
2427
   may find it useful anyway.
2378
2428
2379
Chapter 4. Programming interface
2429
                        Chapter 4. Programming interface
2380
2430
2381
   Recoll has an Application Programming Interface, usable both for indexing
2431
   Recoll has an Application Programming Interface, usable both for indexing
2382
   and searching, currently accessible from the Python language.
2432
   and searching, currently accessible from the Python language.
2383
2433
2384
   Another less radical way to extend the application is to write input
2434
   Another less radical way to extend the application is to write input
...
...
2408
   kind will not be described here.
2458
   kind will not be described here.
2409
2459
2410
   There are currently (1.18 and since 1.13) two kinds of external executable
2460
   There are currently (1.18 and since 1.13) two kinds of external executable
2411
   input handlers:
2461
   input handlers:
2412
2462
2413
     o Simple exec handlers run once and exit. They can be bare programs like
2463
     * Simple exec handlers run once and exit. They can be bare programs like
2414
       antiword, or scripts using other programs. They are very simple to
2464
       antiword, or scripts using other programs. They are very simple to
2415
       write, because they just need to print the converted document to the
2465
       write, because they just need to print the converted document to the
2416
       standard output. Their output can be plain text or HTML. HTML is
2466
       standard output. Their output can be plain text or HTML. HTML is
2417
       usually preferred because it can store metadata fields and it allows
2467
       usually preferred because it can store metadata fields and it allows
2418
       preserving some of the formatting for the GUI preview.
2468
       preserving some of the formatting for the GUI preview.
2419
2469
2420
     o Multiple execm handlers can process multiple files (sparing the
2470
     * Multiple execm handlers can process multiple files (sparing the
2421
       process startup time which can be very significant), or multiple
2471
       process startup time which can be very significant), or multiple
2422
       documents per file (e.g.: for zip or chm files). They communicate with
2472
       documents per file (e.g.: for zip or chm files). They communicate with
2423
       the indexer through a simple protocol, but are nevertheless a bit more
2473
       the indexer through a simple protocol, but are nevertheless a bit more
2424
       complicated than the older kind. Most of new handlers are written in
2474
       complicated than the older kind. Most of new handlers are written in
2425
       Python, using a common module to handle the protocol. There is an
2475
       Python, using a common module to handle the protocol. There is an
...
...
2495
2545
2496
   execm handlers sometimes need to make a choice for the nature of the ipath
2546
   execm handlers sometimes need to make a choice for the nature of the ipath
2497
   elements that they use in communication with the indexer. Here are a few
2547
   elements that they use in communication with the indexer. Here are a few
2498
   guidelines:
2548
   guidelines:
2499
2549
2500
     o Use ASCII or UTF-8 (if the identifier is an integer print it, for
2550
     * Use ASCII or UTF-8 (if the identifier is an integer print it, for
2501
       example, like printf %d would do).
2551
       example, like printf %d would do).
2502
2552
2503
     o If at all possible, the data should make some kind of sense when
2553
     * If at all possible, the data should make some kind of sense when
2504
       printed to a log file to help with debugging.
2554
       printed to a log file to help with debugging.
2505
2555
2506
     o Recoll uses a colon (:) as a separator to store a complex path
2556
     * Recoll uses a colon (:) as a separator to store a complex path
2507
       internally (for deeper embedding). Colons inside the ipath elements
2557
       internally (for deeper embedding). Colons inside the ipath elements
2508
       output by a handler will be escaped, but would be a bad choice as a
2558
       output by a handler will be escaped, but would be a bad choice as a
2509
       handler-specific separator (mostly, again, for debugging issues).
2559
       handler-specific separator (mostly, again, for debugging issues).
2510
2560
2511
   In any case, the main goal is that it should be easy for the handler to
2561
   In any case, the main goal is that it should be easy for the handler to
...
...
2546
2596
2547
 application/x-chm = execm rclchm
2597
 application/x-chm = execm rclchm
2548
2598
2549
   The fragment specifies that:
2599
   The fragment specifies that:
2550
2600
2551
     o application/msword files are processed by executing the antiword
2601
     * application/msword files are processed by executing the antiword
2552
       program, which outputs text/plain encoded in utf-8.
2602
       program, which outputs text/plain encoded in utf-8.
2553
2603
2554
     o application/ogg files are processed by the rclogg script, with default
2604
     * application/ogg files are processed by the rclogg script, with default
2555
       output type (text/html, with encoding specified in the header, or
2605
       output type (text/html, with encoding specified in the header, or
2556
       utf-8 by default).
2606
       utf-8 by default).
2557
2607
2558
     o text/rtf is processed by unrtf, which outputs text/html. The
2608
     * text/rtf is processed by unrtf, which outputs text/html. The
2559
       iso-8859-1 encoding is specified because it is not the utf-8 default,
2609
       iso-8859-1 encoding is specified because it is not the utf-8 default,
2560
       and not output by unrtf in the HTML header section.
2610
       and not output by unrtf in the HTML header section.
2561
2611
2562
     o application/x-chm is processed by a persistant handler. This is
2612
     * application/x-chm is processed by a persistant handler. This is
2563
       determined by the execm keyword.
2613
       determined by the execm keyword.
2564
2614
2565
  4.1.4. Input handler HTML output
2615
  4.1.4. Input handler HTML output
2566
2616
2567
   The output HTML could be very minimal like the following example:
2617
   The output HTML could be very minimal like the following example:
...
...
2651
   Recoll defines a number of default fields. Additional ones can be output
2701
   Recoll defines a number of default fields. Additional ones can be output
2652
   by handlers, and described in the fields configuration file.
2702
   by handlers, and described in the fields configuration file.
2653
2703
2654
   Fields can be:
2704
   Fields can be:
2655
2705
2656
     o indexed, meaning that their terms are separately stored in inverted
2706
     * indexed, meaning that their terms are separately stored in inverted
2657
       lists (with a specific prefix), and that a field-specific search is
2707
       lists (with a specific prefix), and that a field-specific search is
2658
       possible.
2708
       possible.
2659
2709
2660
     o stored, meaning that their value is recorded in the index data record
2710
     * stored, meaning that their value is recorded in the index data record
2661
       for the document, and can be returned and displayed with search
2711
       for the document, and can be returned and displayed with search
2662
       results.
2712
       results.
2663
2713
2664
   A field can be either or both indexed and stored. This and other aspects
2714
   A field can be either or both indexed and stored. This and other aspects
2665
   of fields handling is defined inside the fields configuration file.
2715
   of fields handling is defined inside the fields configuration file.
2666
2716
2667
   The sequence of events for field processing is as follows:
2717
   The sequence of events for field processing is as follows:
2668
2718
2669
     o During indexing, recollindex scans all meta fields in HTML documents
2719
     * During indexing, recollindex scans all meta fields in HTML documents
2670
       (most document types are transformed into HTML at some point). It
2720
       (most document types are transformed into HTML at some point). It
2671
       compares the name for each element to the configuration defining what
2721
       compares the name for each element to the configuration defining what
2672
       should be done with fields (the fields file)
2722
       should be done with fields (the fields file)
2673
2723
2674
     o If the name for the meta element matches one for a field that should
2724
     * If the name for the meta element matches one for a field that should
2675
       be indexed, the contents are processed and the terms are entered into
2725
       be indexed, the contents are processed and the terms are entered into
2676
       the index with the prefix defined in the fields file.
2726
       the index with the prefix defined in the fields file.
2677
2727
2678
     o If the name for the meta element matches one for a field that should
2728
     * If the name for the meta element matches one for a field that should
2679
       be stored, the content of the element is stored with the document data
2729
       be stored, the content of the element is stored with the document data
2680
       record, from which it can be extracted and displayed at query time.
2730
       record, from which it can be extracted and displayed at query time.
2681
2731
2682
     o At query time, if a field search is performed, the index prefix is
2732
     * At query time, if a field search is performed, the index prefix is
2683
       computed and the match is only performed against appropriately
2733
       computed and the match is only performed against appropriately
2684
       prefixed terms in the index.
2734
       prefixed terms in the index.
2685
2735
2686
     o At query time, the field can be displayed inside the result list by
2736
     * At query time, the field can be displayed inside the result list by
2687
       using the appropriate directive in the definition of the result list
2737
       using the appropriate directive in the definition of the result list
2688
       paragraph format. All fields are displayed on the fields screen of the
2738
       paragraph format. All fields are displayed on the fields screen of the
2689
       preview window (which you can reach through the right-click menu).
2739
       preview window (which you can reach through the right-click menu).
2690
       This is independant of the fact that the search which produced the
2740
       This is independant of the fact that the search which produced the
2691
       results used the field or not.
2741
       results used the field or not.
...
...
2747
   searching one is used in the Recoll Ubuntu Unity Lens and Recoll Web UI.
2797
   searching one is used in the Recoll Ubuntu Unity Lens and Recoll Web UI.
2748
2798
2749
   The API is inspired by the Python database API specification. There were
2799
   The API is inspired by the Python database API specification. There were
2750
   two major changes in recent Recoll versions:
2800
   two major changes in recent Recoll versions:
2751
2801
2752
     o The basis for the Recoll API changed from Python database API version
2802
     * The basis for the Recoll API changed from Python database API version
2753
       1.0 (Recoll versions up to 1.18.1), to version 2.0 (Recoll 1.18.2 and
2803
       1.0 (Recoll versions up to 1.18.1), to version 2.0 (Recoll 1.18.2 and
2754
       later).
2804
       later).
2755
     o The recoll module became a package (with an internal recoll module) as
2805
     * The recoll module became a package (with an internal recoll module) as
2756
       of Recoll version 1.19, in order to add more functions. For existing
2806
       of Recoll version 1.19, in order to add more functions. For existing
2757
       code, this only changes the way the interface must be imported.
2807
       code, this only changes the way the interface must be imported.
2758
2808
2759
   We will mostly describe the new API and package structure here. A
2809
   We will mostly describe the new API and package structure here. A
2760
   paragraph at the end of this section will explain a few differences and
2810
   paragraph at the end of this section will explain a few differences and
...
...
2780
2830
2781
    4.3.2.2. Recoll package
2831
    4.3.2.2. Recoll package
2782
2832
2783
   The recoll package contains two modules:
2833
   The recoll package contains two modules:
2784
2834
2785
     o The recoll module contains functions and classes used to query (or
2835
     * The recoll module contains functions and classes used to query (or
2786
       update) the index.
2836
       update) the index.
2787
2837
2788
     o The rclextract module contains functions and classes used to access
2838
     * The rclextract module contains functions and classes used to access
2789
       document data.
2839
       document data.
2790
2840
2791
    4.3.2.3. The recoll module
2841
    4.3.2.3. The recoll module
2792
2842
2793
      Functions
2843
      Functions
2794
2844
2795
   connect(confdir=None, extra_dbs=None, writable = False)
2845
   connect(confdir=None, extra_dbs=None, writable = False)
2796
           The connect() function connects to one or several Recoll index(es)
2846
           The connect() function connects to one or several Recoll index(es)
2797
           and returns a Db object.
2847
           and returns a Db object.
2798
              o confdir may specify a configuration directory. The usual
2848
              * confdir may specify a configuration directory. The usual
2799
                defaults apply.
2849
                defaults apply.
2800
              o extra_dbs is a list of additional indexes (Xapian
2850
              * extra_dbs is a list of additional indexes (Xapian
2801
                directories).
2851
                directories).
2802
              o writable decides if we can index new data through this
2852
              * writable decides if we can index new data through this
2803
                connection.
2853
                connection.
2804
           This call initializes the recoll module, and it should always be
2854
           This call initializes the recoll module, and it should always be
2805
           performed before any other call or object creation.
2855
           performed before any other call or object creation.
2806
2856
2807
      Classes
2857
      Classes
...
...
3045
3095
3046
        rownum = query.next if type(query.next) == int else \
3096
        rownum = query.next if type(query.next) == int else \
3047
                  query.rownumber
3097
                  query.rownumber
3048
3098
3049
3099
3050
Chapter 5. Installation and configuration
3100
                   Chapter 5. Installation and configuration
3051
3101
3052
5.1. Installing a binary copy
3102
5.1. Installing a binary copy
3053
3103
3054
   There are three types of binary Recoll installations:
3104
   There are three types of binary Recoll installations:
3055
3105
3056
     o Through your system normal software distribution framework (ie,
3106
     * Through your system normal software distribution framework (ie,
3057
       Debian/Ubuntu apt, FreeBSD ports, etc.).
3107
       Debian/Ubuntu apt, FreeBSD ports, etc.).
3058
3108
3059
     o From a package downloaded from the Recoll web site.
3109
     * From a package downloaded from the Recoll web site.
3060
3110
3061
     o From a prebuilt tree downloaded from the Recoll web site.
3111
     * From a prebuilt tree downloaded from the Recoll web site.
3062
3112
3063
   In all cases, the strict software dependancies (ie on Xapian or iconv)
3113
   In all cases, the strict software dependancies (ie on Xapian or iconv)
3064
   will be automatically satisfied, you should not have to worry about them.
3114
   will be automatically satisfied, you should not have to worry about them.
3065
3115
3066
   You will only have to check or install supporting applications for the
3116
   You will only have to check or install supporting applications for the
...
...
3120
   by ad hoc handler code now use the xsltproc command, which usually comes
3170
   by ad hoc handler code now use the xsltproc command, which usually comes
3121
   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
3171
   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
3122
3172
3123
   Now for the list:
3173
   Now for the list:
3124
3174
3125
     o Openoffice files need unzip and xsltproc.
3175
     * Openoffice files need unzip and xsltproc.
3126
3176
3127
     o PDF files need pdftotext which is part of the Xpdf or Poppler
3177
     * PDF files need pdftotext which is part of the Xpdf or Poppler
3128
       packages.
3178
       packages.
3129
3179
3130
     o Postscript files need pstotext. The original version has an issue with
3180
     * Postscript files need pstotext. The original version has an issue with
3131
       shell character in file names, which is corrected in recent packages.
3181
       shell character in file names, which is corrected in recent packages.
3132
       See http://www.recoll.org/features.html for more detail.
3182
       See http://www.recoll.org/features.html for more detail.
3133
3183
3134
     o MS Word needs antiword. It is also useful to have wvWare installed as
3184
     * MS Word needs antiword. It is also useful to have wvWare installed as
3135
       it may be be used as a fallback for some files which antiword does not
3185
       it may be be used as a fallback for some files which antiword does not
3136
       handle.
3186
       handle.
3137
3187
3138
     o MS Excel and PowerPoint are processed by internal Python handlers.
3188
     * MS Excel and PowerPoint are processed by internal Python handlers.
3139
3189
3140
     o MS Open XML (docx) needs xsltproc.
3190
     * MS Open XML (docx) needs xsltproc.
3141
3191
3142
     o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
3192
     * Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
3143
       Ubuntu) package.
3193
       Ubuntu) package.
3144
3194
3145
     o RTF files need unrtf, which, in its standard version, has much trouble
3195
     * RTF files need unrtf, which, in its standard version, has much trouble
3146
       with non-western character sets. Check
3196
       with non-western character sets. Check
3147
       http://www.recoll.org/features.html.
3197
       http://www.recoll.org/features.html.
3148
3198
3149
     o TeX files need untex or detex. Check
3199
     * TeX files need untex or detex. Check
3150
       http://www.recoll.org/features.html for sources if it's not packaged
3200
       http://www.recoll.org/features.html for sources if it's not packaged
3151
       for your distribution.
3201
       for your distribution.
3152
3202
3153
     o dvi files need dvips.
3203
     * dvi files need dvips.
3154
3204
3155
     o djvu files need djvutxt and djvused from the DjVuLibre package.
3205
     * djvu files need djvutxt and djvused from the DjVuLibre package.
3156
3206
3157
     o Audio files: Recoll releases 1.14 and later use a single Python
3207
     * Audio files: Recoll releases 1.14 and later use a single Python
3158
       handler based on mutagen for all audio file types.
3208
       handler based on mutagen for all audio file types.
3159
3209
3160
     o Pictures: Recoll uses the Exiftool Perl package to extract tag
3210
     * Pictures: Recoll uses the Exiftool Perl package to extract tag
3161
       information. Most image file formats are supported. Note that there
3211
       information. Most image file formats are supported. Note that there
3162
       may not be much interest in indexing the technical tags (image size,
3212
       may not be much interest in indexing the technical tags (image size,
3163
       aperture, etc.). This is only of interest if you store personal tags
3213
       aperture, etc.). This is only of interest if you store personal tags
3164
       or textual descriptions inside the image files.
3214
       or textual descriptions inside the image files.
3165
3215
3166
     o chm: files in Microsoft help format need Python and the pychm module
3216
     * chm: files in Microsoft help format need Python and the pychm module
3167
       (which needs chmlib).
3217
       (which needs chmlib).
3168
3218
3169
     o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
3219
     * ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
3170
       module. icalendar is not needed for newer versions, which use internal
3220
       module. icalendar is not needed for newer versions, which use internal
3171
       code.
3221
       code.
3172
3222
3173
     o Zip archives need Python (and the standard zipfile module).
3223
     * Zip archives need Python (and the standard zipfile module).
3174
3224
3175
     o Rar archives need Python, the rarfile Python module and the unrar
3225
     * Rar archives need Python, the rarfile Python module and the unrar
3176
       utility.
3226
       utility.
3177
3227
3178
     o Midi karaoke files need Python and the Midi module
3228
     * Midi karaoke files need Python and the Midi module
3179
3229
3180
     o Konqueror webarchive format with Python (uses the Tarfile module).
3230
     * Konqueror webarchive format with Python (uses the Tarfile module).
3181
3231
3182
     o Mimehtml web archive format (support based on the email handler, which
3232
     * Mimehtml web archive format (support based on the email handler, which
3183
       introduces some mild weirdness, but still usable).
3233
       introduces some mild weirdness, but still usable).
3184
3234
3185
   Text, HTML, email folders, and Scribus files are processed internally. Lyx
3235
   Text, HTML, email folders, and Scribus files are processed internally. Lyx
3186
   is used to index Lyx files. Many handlers need iconv and the standard sed
3236
   is used to index Lyx files. Many handlers need iconv and the standard sed
3187
   and awk.
3237
   and awk.
...
...
3196
3246
3197
   You may have to compile Xapian but this is easy.
3247
   You may have to compile Xapian but this is easy.
3198
3248
3199
   The shopping list:
3249
   The shopping list:
3200
3250
3201
     o C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
3251
     * C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
3202
       itself by strange messages about a missing iconv_open.
3252
       itself by strange messages about a missing iconv_open.
3203
3253
3204
     o Development files for Xapian core.
3254
     * Development files for Xapian core.
3205
3255
3206
  Important
3256
  Important
3207
3257
3208
       If you are building Xapian for an older CPU (before Pentium 4 or
3258
       If you are building Xapian for an older CPU (before Pentium 4 or
3209
       Athlon 64), you need to add the --disable-sse flag to the configure
3259
       Athlon 64), you need to add the --disable-sse flag to the configure
3210
       command. Else all Xapian application will crash with an illegal
3260
       command. Else all Xapian application will crash with an illegal
3211
       instruction error.
3261
       instruction error.
3212
3262
3213
     o Development files for Qt 4 . Recoll has not been tested with Qt 5 yet.
3263
     * Development files for Qt 4 . Recoll has not been tested with Qt 5 yet.
3214
       Recoll 1.15.9 was the last version to support Qt 3. If you do not want
3264
       Recoll 1.15.9 was the last version to support Qt 3. If you do not want
3215
       to install or build the Qt Webkit module, Recoll has a configuration
3265
       to install or build the Qt Webkit module, Recoll has a configuration
3216
       option to disable its use (see further).
3266
       option to disable its use (see further).
3217
3267
3218
     o Development files for X11 and zlib.
3268
     * Development files for X11 and zlib.
3219
3269
3220
     o You may also need libiconv. On Linux systems, the iconv interface is
3270
     * You may also need libiconv. On Linux systems, the iconv interface is
3221
       part of libc and you should not need to do anything special.
3271
       part of libc and you should not need to do anything special.
3222
3272
3223
   Check the Recoll download page for up to date version information.
3273
   Check the Recoll download page for up to date version information.
3224
3274
3225
  5.3.2. Building
3275
  5.3.2. Building
...
...
3229
   ok). If you build on another system, and need to modify things, I would
3279
   ok). If you build on another system, and need to modify things, I would
3230
   very much welcome patches.
3280
   very much welcome patches.
3231
3281
3232
   Configure options: 
3282
   Configure options: 
3233
3283
3234
     o --without-aspell will disable the code for phonetic matching of search
3284
     * --without-aspell will disable the code for phonetic matching of search
3235
       terms.
3285
       terms.
3236
3286
3237
     o --with-fam or --with-inotify will enable the code for real time
3287
     * --with-fam or --with-inotify will enable the code for real time
3238
       indexing. Inotify support is enabled by default on recent Linux
3288
       indexing. Inotify support is enabled by default on recent Linux
3239
       systems.
3289
       systems.
3240
3290
3241
     o --with-qzeitgeist will enable sending Zeitgeist events about the
3291
     * --with-qzeitgeist will enable sending Zeitgeist events about the
3242
       visited search results, and needs the qzeitgeist package.
3292
       visited search results, and needs the qzeitgeist package.
3243
3293
3244
     o --disable-webkit is available from version 1.17 to implement the
3294
     * --disable-webkit is available from version 1.17 to implement the
3245
       result list with a Qt QTextBrowser instead of a WebKit widget if you
3295
       result list with a Qt QTextBrowser instead of a WebKit widget if you
3246
       do not or can't depend on the latter.
3296
       do not or can't depend on the latter.
3247
3297
3248
     o --disable-idxthreads is available from version 1.19 to suppress
3298
     * --disable-idxthreads is available from version 1.19 to suppress
3249
       multithreading inside the indexing process. You can also use the
3299
       multithreading inside the indexing process. You can also use the
3250
       run-time configuration to restrict recollindex to using a single
3300
       run-time configuration to restrict recollindex to using a single
3251
       thread, but the compile-time option may disable a few more unused
3301
       thread, but the compile-time option may disable a few more unused
3252
       locks. This only applies to the use of multithreading for the core
3302
       locks. This only applies to the use of multithreading for the core
3253
       index processing (data input). The Recoll monitor mode always uses at
3303
       index processing (data input). The Recoll monitor mode always uses at
3254
       least two threads of execution.
3304
       least two threads of execution.
3255
3305
3256
     o --disable-python-module will avoid building the Python module.
3306
     * --disable-python-module will avoid building the Python module.
3257
3307
3258
     o --disable-xattr will prevent fetching data from file extended
3308
     * --disable-xattr will prevent fetching data from file extended
3259
       attributes. Beyond a few standard attributes, fetching extended
3309
       attributes. Beyond a few standard attributes, fetching extended
3260
       attributes data can only be useful is some application stores data in
3310
       attributes data can only be useful is some application stores data in
3261
       there, and also needs some simple configuration (see comments in the
3311
       there, and also needs some simple configuration (see comments in the
3262
       fields configuration file).
3312
       fields configuration file).
3263
3313
3264
     o --enable-camelcase will enable splitting camelCase words. This is not
3314
     * --enable-camelcase will enable splitting camelCase words. This is not
3265
       enabled by default as it has the unfortunate side-effect of making
3315
       enabled by default as it has the unfortunate side-effect of making
3266
       some phrase searches quite confusing: ie, "MySQL manual" would be
3316
       some phrase searches quite confusing: ie, "MySQL manual" would be
3267
       matched by "MySQL manual" and "my sql manual" but not "mysql manual"
3317
       matched by "MySQL manual" and "my sql manual" but not "mysql manual"
3268
       (only inside phrase searches).
3318
       (only inside phrase searches).
3269
3319
3270
     o --with-file-command Specify the version of the 'file' command to use
3320
     * --with-file-command Specify the version of the 'file' command to use
3271
       (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
3321
       (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
3272
       the gnu version on systems where the native one is bad.
3322
       the gnu version on systems where the native one is bad.
3273
3323
3274
     o --disable-qtgui Disable the Qt interface. Will allow building the
3324
     * --disable-qtgui Disable the Qt interface. Will allow building the
3275
       indexer and the command line search program in absence of a Qt
3325
       indexer and the command line search program in absence of a Qt
3276
       environment.
3326
       environment.
3277
3327
3278
     o --disable-x11mon Disable X11 connection monitoring inside recollindex.
3328
     * --disable-x11mon Disable X11 connection monitoring inside recollindex.
3279
       Together with --disable-qtgui, this allows building recoll without Qt
3329
       Together with --disable-qtgui, this allows building recoll without Qt
3280
       and X11.
3330
       and X11.
3281
3331
3282
     o --disable-pic will compile Recoll with position-dependant code. This
3332
     * --disable-pic will compile Recoll with position-dependant code. This
3283
       is incompatible with building the KIO or the Python or PHP extensions,
3333
       is incompatible with building the KIO or the Python or PHP extensions,
3284
       but might yield very marginally faster code.
3334
       but might yield very marginally faster code.
3285
3335
3286
     o Of course the usual autoconf configure options, like --prefix apply.
3336
     * Of course the usual autoconf configure options, like --prefix apply.
3287
3337
3288
   Normal procedure:
3338
   Normal procedure:
3289
3339
3290
         cd recoll-xxx
3340
         cd recoll-xxx
3291
         configure
3341
         configure
...
...
3387
         defaultcharset = utf-8
3437
         defaultcharset = utf-8
3388
        
3438
        
3389
3439
3390
   There are three kinds of lines:
3440
   There are three kinds of lines:
3391
3441
3392
     o Comment (starts with #) or empty.
3442
     * Comment (starts with #) or empty.
3393
3443
3394
     o Parameter affectation (name = value).
3444
     * Parameter affectation (name = value).
3395
3445
3396
     o Section definition ([somedirname]).
3446
     * Section definition ([somedirname]).
3397
3447
3398
   Depending on the type of configuration file, section definitions either
3448
   Depending on the type of configuration file, section definitions either
3399
   separate groups of parameters or allow redefining some parameters for a
3449
   separate groups of parameters or allow redefining some parameters for a
3400
   directory sub-tree. They stay in effect until another section definition,
3450
   directory sub-tree. They stay in effect until another section definition,
3401
   or the end of file, is encountered. Some of the parameters used for
3451
   or the end of file, is encountered. Some of the parameters used for
...
...
3410
   embedded spaces can be quoted using double-quotes.
3460
   embedded spaces can be quoted using double-quotes.
3411
3461
3412
   Encoding issues. Most of the configuration parameters are plain ASCII. Two
3462
   Encoding issues. Most of the configuration parameters are plain ASCII. Two
3413
   particular sets of values may cause encoding issues:
3463
   particular sets of values may cause encoding issues:
3414
3464
3415
     o File path parameters may contain non-ascii characters and should use
3465
     * File path parameters may contain non-ascii characters and should use
3416
       the exact same byte values as found in the file system directory.
3466
       the exact same byte values as found in the file system directory.
3417
       Usually, this means that the configuration file should use the system
3467
       Usually, this means that the configuration file should use the system
3418
       default locale encoding.
3468
       default locale encoding.
3419
3469
3420
     o The unac_except_trans parameter should be encoded in UTF-8. If your
3470
     * The unac_except_trans parameter should be encoded in UTF-8. If your
3421
       system locale is not UTF-8, and you need to also specify non-ascii
3471
       system locale is not UTF-8, and you need to also specify non-ascii
3422
       file paths, this poses a difficulty because common text editors cannot
3472
       file paths, this poses a difficulty because common text editors cannot
3423
       handle multiple encodings in a single file. In this relatively
3473
       handle multiple encodings in a single file. In this relatively
3424
       unlikely case, you can edit the configuration file as two separate
3474
       unlikely case, you can edit the configuration file as two separate
3425
       text files with appropriate encodings, and concatenate them to create
3475
       text files with appropriate encodings, and concatenate them to create
...
...
3570
           indexing, or for all files inside the selected subtrees,
3620
           indexing, or for all files inside the selected subtrees,
3571
           independently of MIME type.
3621
           independently of MIME type.
3572
3622
3573
   usesystemfilecommand
3623
   usesystemfilecommand
3574
3624
3575
           Decide if we use the file -i system command as a final step for
3625
           Decide if we execute a system command (file -i by default) as a
3576
           determining the MIME type for a file (the main procedure uses
3626
           final step for determining the MIME type for a file (the main
3577
           suffix associations as defined in the mimemap file). This can be
3627
           procedure uses suffix associations as defined in the mimemap
3578
           useful for files with suffix-less names, but it will also cause
3628
           file). This can be useful for files with suffix-less names, but it
3579
           the indexing of many bogus "text" files.
3629
           will also cause the indexing of many bogus "text" files.
3630
3631
   systemfilecommand
3632
3633
           Command to use for mime for mime type determination if
3634
           usesystefilecommand is set. Recent versions of xdg-mime sometimes
3635
           work better than file.
3580
3636
3581
   processwebqueue
3637
   processwebqueue
3582
3638
3583
           If this is set, process the directory where Web browser plugins
3639
           If this is set, process the directory where Web browser plugins
3584
           copy visited pages for indexing.
3640
           copy visited pages for indexing.
...
...
3996
   The fields file has several sections, which each define an aspect of
4052
   The fields file has several sections, which each define an aspect of
3997
   fields processing. Quite often, you'll have to modify several sections to
4053
   fields processing. Quite often, you'll have to modify several sections to
3998
   obtain the desired behaviour.
4054
   obtain the desired behaviour.
3999
4055
4000
   We will only give a short description here, you should refer to the
4056
   We will only give a short description here, you should refer to the
4001
   comments inside the file for more detailed information.
4057
   comments inside the default file for more detailed information.
4002
4058
4003
   Field names should be lowercase alphabetic ASCII.
4059
   Field names should be lowercase alphabetic ASCII.
4004
4060
4005
   [prefixes]
4061
   [prefixes]
4006
4062
...
...
4014
4070
4015
   [aliases]
4071
   [aliases]
4016
4072
4017
           This section defines lists of synonyms for the canonical names
4073
           This section defines lists of synonyms for the canonical names
4018
           used inside the [prefixes] and [stored] sections
4074
           used inside the [prefixes] and [stored] sections
4075
4076
   [queryaliases]
4077
4078
           This section also defines aliases for the canonic field names,
4079
           with the difference that the substitution will only be used at
4080
           query time, avoiding any possibility that the value would pick-up
4081
           random metadata from documents.
4019
4082
4020
   handler-specific sections
4083
   handler-specific sections
4021
4084
4022
           Some input handlers may need specific configuration for handling
4085
           Some input handlers may need specific configuration for handling
4023
           fields. Only the email message handler currently has such a
4086
           fields. Only the email message handler currently has such a
...
...
4037
4100
4038
 [stored]
4101
 [stored]
4039
 # Store mailmytag inside the document data record (so that it can be
4102
 # Store mailmytag inside the document data record (so that it can be
4040
 # displayed - as %(mailmytag) - in result lists).
4103
 # displayed - as %(mailmytag) - in result lists).
4041
 mailmytag =
4104
 mailmytag =
4105
4106
 [queryaliases]
4107
 filename = fn
4108
 containerfilename = cfn
4042
4109
4043
 [mail]
4110
 [mail]
4044
 # Extract the X-My-Tag mail header, and use it internally with the
4111
 # Extract the X-My-Tag mail header, and use it internally with the
4045
 # mailmytag field name
4112
 # mailmytag field name
4046
 x-my-tag = mailmytag
4113
 x-my-tag = mailmytag
...
...
4131
   mydoc.doc.gz).
4198
   mydoc.doc.gz).
4132
4199
4133
   The right side of each assignment holds a command to be executed for
4200
   The right side of each assignment holds a command to be executed for
4134
   opening the file. The following substitutions are performed:
4201
   opening the file. The following substitutions are performed:
4135
4202
4136
     o %D. Document date
4203
     * %D. Document date
4137
4204
4138
     o %f. File name. This may be the name of a temporary file if it was
4205
     * %f. File name. This may be the name of a temporary file if it was
4139
       necessary to create one (ie: to extract a subdocument from a
4206
       necessary to create one (ie: to extract a subdocument from a
4140
       container).
4207
       container).
4141
4208
4142
     o %F. Original file name. Same as %f except if a temporary file is used.
4143
4144
     o %i. Internal path, for subdocuments of containers. The format depends
4209
     * %i. Internal path, for subdocuments of containers. The format depends
4145
       on the container type. If this appears in the command line, Recoll
4210
       on the container type. If this appears in the command line, Recoll
4146
       will not create a temporary file to extract the subdocument, expecting
4211
       will not create a temporary file to extract the subdocument, expecting
4147
       the called application (possibly a script) to be able to handle it.
4212
       the called application (possibly a script) to be able to handle it.
4148
4213
4149
     o %M. MIME type
4214
     * %M. MIME type
4150
4215
4151
     o %p. Page index. Only significant for a subset of document types,
4216
     * %p. Page index. Only significant for a subset of document types,
4152
       currently only PDF, Postscript and DVI files. Can be used to start the
4217
       currently only PDF, Postscript and DVI files. Can be used to start the
4153
       editor at the right page for a match or snippet.
4218
       editor at the right page for a match or snippet.
4154
4219
4155
     o %s. Search term. The value will only be set for documents with indexed
4220
     * %s. Search term. The value will only be set for documents with indexed
4156
       page numbers (ie: PDF). The value will be one of the matched search
4221
       page numbers (ie: PDF). The value will be one of the matched search
4157
       terms. It would allow pre-setting the value in the "Find" entry inside
4222
       terms. It would allow pre-setting the value in the "Find" entry inside
4158
       Evince for example, for easy highlighting of the term.
4223
       Evince for example, for easy highlighting of the term.
4159
4224
4160
     o %U, %u. Url.
4225
     * %u. Url.
4161
4226
4162
   In addition to the predefined values above, all strings like %(fieldname)
4227
   In addition to the predefined values above, all strings like %(fieldname)
4163
   will be replaced by the value of the field named fieldname for the
4228
   will be replaced by the value of the field named fieldname for the
4164
   document. This could be used in combination with field customisation to
4229
   document. This could be used in combination with field customisation to
4165
   help with opening the document.
4230
   help with opening the document.
...
...
4192
   the result list (when found by file name). The file names end in .blob and
4257
   the result list (when found by file name). The file names end in .blob and
4193
   can be displayed by application blobviewer.
4258
   can be displayed by application blobviewer.
4194
4259
4195
   You need two entries in the configuration files for this to work:
4260
   You need two entries in the configuration files for this to work:
4196
4261
4197
     o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
4262
     * In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
4198
       following line:
4263
       following line:
4199
4264
4200
 .blob = application/x-blobapp
4265
 .blob = application/x-blobapp
4201
4266
4202
       Note that the MIME type is made up here, and you could call it
4267
       Note that the MIME type is made up here, and you could call it
4203
       diesel/oil just the same.
4268
       diesel/oil just the same.
4204
4269
4205
     o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
4270
     * In $RECOLL_CONFDIR/mimeview under the [view] section, add:
4206
4271
4207
 application/x-blobapp = blobviewer %f
4272
 application/x-blobapp = blobviewer %f
4208
4273
4209
       We are supposing that blobviewer wants a file name parameter here, you
4274
       We are supposing that blobviewer wants a file name parameter here, you
4210
       would use %u if it liked URLs better.
4275
       would use %u if it liked URLs better.
...
...
4221
   text and that you know how to extract it with a command line program.
4286
   text and that you know how to extract it with a command line program.
4222
   Getting Recoll to index the files is easy. You need to perform the above
4287
   Getting Recoll to index the files is easy. You need to perform the above
4223
   alteration, and also to add data to the mimeconf file (typically in
4288
   alteration, and also to add data to the mimeconf file (typically in
4224
   ~/.recoll/mimeconf):
4289
   ~/.recoll/mimeconf):
4225
4290
4226
     o Under the [index] section, add the following line (more about the
4291
     * Under the [index] section, add the following line (more about the
4227
       rclblob indexing script later):
4292
       rclblob indexing script later):
4228
4293
4229
 application/x-blobapp = exec rclblob
4294
 application/x-blobapp = exec rclblob
4230
4295
4231
     o Under the [icons] section, you should choose an icon to be displayed
4296
     * Under the [icons] section, you should choose an icon to be displayed
4232
       for the files inside the result lists. Icons are normally 64x64 pixels
4297
       for the files inside the result lists. Icons are normally 64x64 pixels
4233
       PNG files which live in /usr/[local/]share/recoll/images.
4298
       PNG files which live in /usr/[local/]share/recoll/images.
4234
4299
4235
     o Under the [categories] section, you should add the MIME type where it
4300
     * Under the [categories] section, you should add the MIME type where it
4236
       makes sense (you can also create a category). Categories may be used
4301
       makes sense (you can also create a category). Categories may be used
4237
       for filtering in advanced search.
4302
       for filtering in advanced search.
4238
4303
4239
   The rclblob handler should be an executable program or script which exists
4304
   The rclblob handler should be an executable program or script which exists
4240
   inside /usr/[local/]share/recoll/filters. It will be given a file name as
4305
   inside /usr/[local/]share/recoll/filters. It will be given a file name as