|
a/src/README |
|
b/src/README |
|
... |
|
... |
100 |
|
100 |
|
101 |
6.1. Writing a document filter
|
101 |
6.1. Writing a document filter
|
102 |
|
102 |
|
103 |
6.1.1. Filter HTML output
|
103 |
6.1.1. Filter HTML output
|
104 |
|
104 |
|
105 |
6.2. Field data processing configuration
|
105 |
6.2. Field data processing
|
106 |
|
106 |
|
107 |
6.3. API
|
107 |
6.3. API
|
108 |
|
108 |
|
109 |
6.3.1. Interface elements
|
109 |
6.3.1. Interface elements
|
110 |
|
110 |
|
|
... |
|
... |
130 |
|
130 |
|
131 |
7.4. Configuration overview
|
131 |
7.4. Configuration overview
|
132 |
|
132 |
|
133 |
7.4.1. Main configuration file
|
133 |
7.4.1. Main configuration file
|
134 |
|
134 |
|
|
|
135 |
7.4.2. The fields file
|
|
|
136 |
|
135 |
7.4.2. The mimemap file
|
137 |
7.4.3. The mimemap file
|
136 |
|
138 |
|
137 |
7.4.3. The mimeconf file
|
139 |
7.4.4. The mimeconf file
|
138 |
|
140 |
|
139 |
7.4.4. The mimeview file
|
141 |
7.4.5. The mimeview file
|
140 |
|
142 |
|
141 |
7.4.5. Examples of configuration adjustments
|
143 |
7.4.6. Examples of configuration adjustments
|
142 |
|
144 |
|
143 |
7.5. The KDE Kicker Recoll applet
|
145 |
7.5. The KDE Kicker Recoll applet
|
144 |
|
146 |
|
145 |
----------------------------------------------------------------------
|
147 |
----------------------------------------------------------------------
|
146 |
|
148 |
|
|
... |
|
... |
865 |
|
867 |
|
866 |
* dir for filtering the results on file location (Ex:
|
868 |
* dir for filtering the results on file location (Ex:
|
867 |
dir:/home/me/somedir). Please note that this is quite inefficient,
|
869 |
dir:/home/me/somedir). Please note that this is quite inefficient,
|
868 |
that it may produce very slow searches, and that it may be worth in
|
870 |
that it may produce very slow searches, and that it may be worth in
|
869 |
some cases to set up separate databases instead.
|
871 |
some cases to set up separate databases instead.
|
|
|
872 |
|
|
|
873 |
* date for searching or filtering on dates. The syntax for the argument
|
|
|
874 |
is based on the ISO8601 standard for dates and time intervals. Only
|
|
|
875 |
dates are supported, no times. The general syntax is 2 elements
|
|
|
876 |
separated by a / character. Each element can be a date or a period of
|
|
|
877 |
time. Periods are specified as PnYnMnD. The n numbers are the
|
|
|
878 |
respective numbers of years, months or days, any of which may be
|
|
|
879 |
missing. Dates are specified as YYYY-MM-DD. The days and months parts
|
|
|
880 |
may be missing. If the / is present but an element is missing, the
|
|
|
881 |
missing element is interpreted as the lowest or highest date in the
|
|
|
882 |
index. Exemples:
|
|
|
883 |
|
|
|
884 |
* 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
|
|
|
885 |
|
|
|
886 |
* 2001-03-01/P1Y2M the same specified with a period.
|
|
|
887 |
|
|
|
888 |
* 2001/ from the beginning of 2001 to the latest date in the index.
|
|
|
889 |
|
|
|
890 |
* 2001 the whole year of 2001
|
|
|
891 |
|
|
|
892 |
* P2D/ means 2 days ago up to now if there are no documents with
|
|
|
893 |
dates in the future.
|
|
|
894 |
|
|
|
895 |
* /2003 all documents from 2003 or older.
|
|
|
896 |
|
|
|
897 |
Periods can also be specified with small letters (ie: p2y).
|
870 |
|
898 |
|
871 |
* mime or format for specifying the mime type. This one is quite special
|
899 |
* mime or format for specifying the mime type. This one is quite special
|
872 |
because you can specify several values which will be OR'ed (the normal
|
900 |
because you can specify several values which will be OR'ed (the normal
|
873 |
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
901 |
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
874 |
Specifying an explicit boolean operator or negation (-) before a mime
|
902 |
Specifying an explicit boolean operator or negation (-) before a mime
|
|
... |
|
... |
1154 |
search entry field.
|
1182 |
search entry field.
|
1155 |
|
1183 |
|
1156 |
Wildcards. Wildcards can be used inside search terms in all forms of
|
1184 |
Wildcards. Wildcards can be used inside search terms in all forms of
|
1157 |
searches. More about wildcards.
|
1185 |
searches. More about wildcards.
|
1158 |
|
1186 |
|
|
|
1187 |
Automatic suffixes. Words like odt or ods can be automatically turned into
|
|
|
1188 |
query language ext:xxx clauses. This can be enabled in the Search
|
|
|
1189 |
preferences panel in the GUI.
|
|
|
1190 |
|
1159 |
Disabling stem expansion. Entering a capitalized word in any search field
|
1191 |
Disabling stem expansion. Entering a capitalized word in any search field
|
1160 |
will prevent stem expansion (no search for gardening if you enter Garden
|
1192 |
will prevent stem expansion (no search for gardening if you enter Garden
|
1161 |
instead of garden). This is the only case where character case should make
|
1193 |
instead of garden). This is the only case where character case should make
|
1162 |
a difference for a Recoll search. You can also disable stem expansion or
|
1194 |
a difference for a Recoll search. You can also disable stem expansion or
|
1163 |
change the stemming language in the preferences.
|
1195 |
change the stemming language in the preferences.
|
|
... |
|
... |
1319 |
document abstracts when displaying the result list. Abstracts are
|
1351 |
document abstracts when displaying the result list. Abstracts are
|
1320 |
constructed by taking context from the document information, around
|
1352 |
constructed by taking context from the document information, around
|
1321 |
the search terms. This can slow down result list display significantly
|
1353 |
the search terms. This can slow down result list display significantly
|
1322 |
for big documents, and you may want to turn it off.
|
1354 |
for big documents, and you may want to turn it off.
|
1323 |
|
1355 |
|
1324 |
* Replace abstracts from documents: this decides if we should synthesize
|
|
|
1325 |
and display an abstract in place of an explicit abstract found within
|
|
|
1326 |
the document itself.
|
|
|
1327 |
|
|
|
1328 |
* Synthetic abstract size: adjust to taste...
|
1356 |
* Synthetic abstract size: adjust to taste...
|
1329 |
|
1357 |
|
1330 |
* Synthetic abstract context words: how many words should be displayed
|
1358 |
* Synthetic abstract context words: how many words should be displayed
|
1331 |
around each term occurrence.
|
1359 |
around each term occurrence.
|
|
|
1360 |
|
|
|
1361 |
* Query language magic file name suffixes: a list of words which
|
|
|
1362 |
automatically get turned into ext:xxx file name suffix clauses when
|
|
|
1363 |
starting a query language query (ie: doc xls xlsx...). This will save
|
|
|
1364 |
some typing for people who use file types a lot when querying.
|
1332 |
|
1365 |
|
1333 |
External indexes: This panel will let you browse for additional indexes
|
1366 |
External indexes: This panel will let you browse for additional indexes
|
1334 |
that you may want to search. External indexes are designated by their
|
1367 |
that you may want to search. External indexes are designated by their
|
1335 |
database directory (ie: /home/someothergui/.recoll/xapiandb,
|
1368 |
database directory (ie: /home/someothergui/.recoll/xapiandb,
|
1336 |
/usr/local/recollglobal/xapiandb).
|
1369 |
/usr/local/recollglobal/xapiandb).
|
|
... |
|
... |
1648 |
See the following section for details about configuring how field data is
|
1681 |
See the following section for details about configuring how field data is
|
1649 |
processed by the indexer.
|
1682 |
processed by the indexer.
|
1650 |
|
1683 |
|
1651 |
----------------------------------------------------------------------
|
1684 |
----------------------------------------------------------------------
|
1652 |
|
1685 |
|
1653 |
6.2. Field data processing configuration
|
1686 |
6.2. Field data processing
|
1654 |
|
1687 |
|
1655 |
Fields are named pieces of information in or about documents, like title,
|
1688 |
Fields are named pieces of information in or about documents, like title,
|
1656 |
author, abstract.
|
1689 |
author, abstract.
|
1657 |
|
1690 |
|
1658 |
The field values for documents can appear in several ways during indexing:
|
1691 |
The field values for documents can appear in several ways during indexing:
|
|
... |
|
... |
1673 |
|
1706 |
|
1674 |
* stored, meaning that their value is recorded in the index data record
|
1707 |
* stored, meaning that their value is recorded in the index data record
|
1675 |
for the document, and can be returned and displayed with search
|
1708 |
for the document, and can be returned and displayed with search
|
1676 |
results.
|
1709 |
results.
|
1677 |
|
1710 |
|
1678 |
A field can be either or both indexed and stored.
|
1711 |
A field can be either or both indexed and stored. This and other aspects
|
|
|
1712 |
of fields handling is defined inside the fields configuration file.
|
1679 |
|
1713 |
|
1680 |
A field becomes indexed by having a prefix defined in the [prefixes]
|
1714 |
You can find more information in the section about the fields file, or in
|
1681 |
section of the fields file. See the comments in there for details
|
1715 |
comments inside the file.
|
1682 |
|
|
|
1683 |
A field becomes stored by appearing in the [stored] section of the fields
|
|
|
1684 |
file.
|
|
|
1685 |
|
|
|
1686 |
See the comments inside the fields for more details.
|
|
|
1687 |
|
1716 |
|
1688 |
----------------------------------------------------------------------
|
1717 |
----------------------------------------------------------------------
|
1689 |
|
1718 |
|
1690 |
6.3. API
|
1719 |
6.3. API
|
1691 |
|
1720 |
|
|
... |
|
... |
2039 |
|
2068 |
|
2040 |
After an indexing pass, the commands that were found missing can be
|
2069 |
After an indexing pass, the commands that were found missing can be
|
2041 |
displayed from the recoll File menu. The list is stored in the missing
|
2070 |
displayed from the recoll File menu. The list is stored in the missing
|
2042 |
text file inside the configuration directory.
|
2071 |
text file inside the configuration directory.
|
2043 |
|
2072 |
|
2044 |
A list of common file types which need external commands:
|
2073 |
A list of common file types which need external commands follows. Many of
|
|
|
2074 |
the filters need the iconv command, which is not always listed as a
|
|
|
2075 |
dependancy.
|
|
|
2076 |
|
|
|
2077 |
As of Recoll release 1.14, a number of XML-based formats that were handled
|
|
|
2078 |
by ad hoc filter code now use xsltproc, which usually comes with libxslt.
|
|
|
2079 |
These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
2045 |
|
2080 |
|
2046 |
* Openoffice: supported natively, but needs the unzip command to be
|
2081 |
* Openoffice: supported natively, but needs the unzip command to be
|
2047 |
installed.
|
2082 |
installed.
|
2048 |
|
2083 |
|
2049 |
* PDF: pdftotext is part of the Xpdf or Poppler packages.
|
2084 |
* PDF: pdftotext is part of the Xpdf or Poppler packages.
|
|
... |
|
... |
2051 |
* Postscript: pstotext.
|
2086 |
* Postscript: pstotext.
|
2052 |
|
2087 |
|
2053 |
* MS Word: antiword.
|
2088 |
* MS Word: antiword.
|
2054 |
|
2089 |
|
2055 |
* MS Excel and PowerPoint: catdoc.
|
2090 |
* MS Excel and PowerPoint: catdoc.
|
|
|
2091 |
|
|
|
2092 |
* MS Open XML (docx): needs xsltproc.
|
2056 |
|
2093 |
|
2057 |
* Wordperfect files: libwpd.
|
2094 |
* Wordperfect files: libwpd.
|
2058 |
|
2095 |
|
2059 |
* RTF: unrtf
|
2096 |
* RTF: unrtf
|
2060 |
|
2097 |
|
|
... |
|
... |
2065 |
|
2102 |
|
2066 |
* dvi: dvips
|
2103 |
* dvi: dvips
|
2067 |
|
2104 |
|
2068 |
* djvu: DjVuLibre
|
2105 |
* djvu: DjVuLibre
|
2069 |
|
2106 |
|
2070 |
* mp3: Recoll will use the id3info command from the id3lib package to
|
2107 |
* mp3, flac, ogg vorbis: Recoll releases before 1.13 use the id3info
|
2071 |
extract tag information. Without it, only the file names will be
|
2108 |
command from the id3lib package to extract mp3 tag information. (Some
|
2072 |
indexed.
|
2109 |
gcc versions after 4.4 may have trouble compiling id3lib. You can find
|
2073 |
|
2110 |
a workaround here), metaflac (standard flac tools) for flac files, and
|
2074 |
* flac files need metaflac.
|
2111 |
ogginfo (vorbis tools) for ogg files. Releases 1.14 and later use a
|
2075 |
|
2112 |
single Python filter based on mutagen for all audio file types.
|
2076 |
* ogg files need ogginfo.
|
|
|
2077 |
|
2113 |
|
2078 |
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
2114 |
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
2079 |
information. Most image file formats are supported. Note that there
|
2115 |
information. Most image file formats are supported. Note that there
|
2080 |
may not be much interest in indexing the technical tags (image size,
|
2116 |
may not be much interest in indexing the technical tags (image size,
|
2081 |
aperture, etc.). This is only of interest if you store personal tags
|
2117 |
aperture, etc.). This is only of interest if you store personal tags
|
2082 |
or textual descriptions inside the image files.
|
2118 |
or textual descriptions inside the image files.
|
2083 |
|
2119 |
|
2084 |
* chm: files in microsoft help format need Python and the pychm module
|
2120 |
* chm: files in microsoft help format need Python and the pychm module
|
2085 |
(which needs chmlib).
|
2121 |
(which needs chmlib).
|
2086 |
|
2122 |
|
2087 |
* ics: iCalendar files need Python and the icalendar module.
|
2123 |
* ics: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
|
|
2124 |
module. For newer versions, icalendar is not needed
|
2088 |
|
2125 |
|
2089 |
* zip: Zip archives need Python (and the standard zipfile module).
|
2126 |
* zip: Zip archives need Python (and the standard zipfile module).
|
2090 |
|
2127 |
|
2091 |
Text, HTML, mail folders, Openoffice and Scribus files are processed
|
2128 |
Text, HTML, mail folders, Openoffice and Scribus files are processed
|
2092 |
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
2129 |
internally. Lyx is used to index Lyx files. Many filters need iconv and
|
|
|
2130 |
the standard sed and awk.
|
2093 |
|
2131 |
|
2094 |
----------------------------------------------------------------------
|
2132 |
----------------------------------------------------------------------
|
2095 |
|
2133 |
|
2096 |
7.3. Building from source
|
2134 |
7.3. Building from source
|
2097 |
|
2135 |
|
2098 |
7.3.1. Prerequisites
|
2136 |
7.3.1. Prerequisites
|
2099 |
|
2137 |
|
2100 |
At the very least, you will need to download and install the xapian core
|
2138 |
C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
2101 |
package and the qt run-time and development packages. Check the Recoll
|
2139 |
itself by strange messages about a missing iconv_open.
|
|
|
2140 |
|
|
|
2141 |
Development files for Xapian core
|
|
|
2142 |
|
|
|
2143 |
Development files for Qt .
|
|
|
2144 |
|
|
|
2145 |
Development files for X11 and zlib.
|
|
|
2146 |
|
2102 |
download page for up to date version information.
|
2147 |
Check the Recoll download page for up to date version information.
|
2103 |
|
2148 |
|
2104 |
You will most probably be able to find a binary package for qt for your
|
2149 |
You will most probably be able to find a binary package for Qt for your
|
2105 |
system. You may have to compile Xapian but this is not difficult (if you
|
2150 |
system. You may have to compile Xapian but this is not difficult (if you
|
2106 |
are using FreeBSD, there is a port).
|
2151 |
are using FreeBSD, there is a port).
|
2107 |
|
2152 |
|
2108 |
You may also need libiconv. Recoll currently uses version 1.9 (this should
|
2153 |
You may also need libiconv. Recoll currently uses version 1.9 (this should
|
2109 |
not be critical). On Linux systems, the iconv interface is part of libc
|
2154 |
not be critical). On Linux systems, the iconv interface is part of libc
|
|
... |
|
... |
2111 |
|
2156 |
|
2112 |
----------------------------------------------------------------------
|
2157 |
----------------------------------------------------------------------
|
2113 |
|
2158 |
|
2114 |
7.3.2. Building
|
2159 |
7.3.2. Building
|
2115 |
|
2160 |
|
2116 |
Recoll has been built on Linux, FreeBSD, macosx, and Solaris, most
|
2161 |
Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
|
2117 |
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
2162 |
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
2118 |
ok). If you build on another system, and need to modify things, I would
|
2163 |
ok). If you build on another system, and need to modify things, I would
|
2119 |
very much welcome patches.
|
2164 |
very much welcome patches.
|
2120 |
|
2165 |
|
2121 |
Depending on the Qt 3 configuration on your system, you may have to set
|
2166 |
Depending on the Qt 3 configuration on your system, you may have to set
|
|
... |
|
... |
2280 |
The default configuration will index your home directory. If this is not
|
2325 |
The default configuration will index your home directory. If this is not
|
2281 |
appropriate, start recoll to create a blank configuration, click Cancel,
|
2326 |
appropriate, start recoll to create a blank configuration, click Cancel,
|
2282 |
and edit the configuration file before restarting the command. This will
|
2327 |
and edit the configuration file before restarting the command. This will
|
2283 |
start the initial indexing, which may take some time.
|
2328 |
start the initial indexing, which may take some time.
|
2284 |
|
2329 |
|
|
|
2330 |
Most of the following parameters can be changed from the Index
|
|
|
2331 |
Configuration menu in the recoll interface. Some can only be set by
|
|
|
2332 |
editing the configuration file.
|
|
|
2333 |
|
|
|
2334 |
----------------------------------------------------------------------
|
|
|
2335 |
|
2285 |
Paramers affecting what we index:
|
2336 |
7.4.1.1. Parameters affecting what documents we index:
|
2286 |
|
2337 |
|
2287 |
topdirs
|
2338 |
topdirs
|
2288 |
|
2339 |
|
2289 |
Specifies the list of directories or files to index (recursively
|
2340 |
Specifies the list of directories or files to index (recursively
|
2290 |
for directories). The indexer will not follow symbolic links
|
2341 |
for directories). You can use symbolic links as elements of this
|
2291 |
inside the indexed trees by default (see the followLinks options
|
2342 |
list. See the followLinks option about following symbolic links
|
2292 |
though).
|
2343 |
found under the top elements (not followed by default).
|
2293 |
|
2344 |
|
2294 |
skippedNames
|
2345 |
skippedNames
|
2295 |
|
2346 |
|
2296 |
A space-separated list of patterns for names of files or
|
2347 |
A space-separated list of patterns for names of files or
|
2297 |
directories that should be completely ignored. The list defined in
|
2348 |
directories that should be completely ignored. The list defined in
|
|
... |
|
... |
2401 |
|
2452 |
|
2402 |
The path to the Beagle indexing queue. This is hard-coded in the
|
2453 |
The path to the Beagle indexing queue. This is hard-coded in the
|
2403 |
Beagle plugin as ~/.beagle/ToIndex so there should be no need to
|
2454 |
Beagle plugin as ~/.beagle/ToIndex so there should be no need to
|
2404 |
change it.
|
2455 |
change it.
|
2405 |
|
2456 |
|
2406 |
Parameters affecting where and how we store things:
|
2457 |
----------------------------------------------------------------------
|
2407 |
|
2458 |
|
2408 |
dbdir
|
2459 |
7.4.1.2. Parameters affecting how we generate terms:
|
2409 |
|
2460 |
|
2410 |
The name of the Xapian data directory. It will be created if
|
2461 |
Changing some of these parameters will imply a full reindex. Also, when
|
2411 |
needed when the index is initialized. If this is not an absolute
|
2462 |
using multiple indexes, it may not make sense to search indexes that don't
|
2412 |
path, it will be interpreted relative to the configuration
|
2463 |
share the values for these parameters, because they usually affect both
|
2413 |
directory. The value can have embedded spaces but starting or
|
2464 |
search and index operations.
|
2414 |
trailing spaces will be trimmed. You cannot use quotes here.
|
|
|
2415 |
|
2465 |
|
2416 |
maxfsoccuppc
|
2466 |
nonumbers
|
2417 |
|
2467 |
|
2418 |
Maximum file system occupation before we stop indexing. The value
|
2468 |
If this set to true, no terms will be generated for numbers. For
|
2419 |
is a percentage, corresponding to what the "Capacity" df output
|
2469 |
example "123", "1.5e6", 192.168.1.4, would not be indexed
|
2420 |
column shows. The default value is 0, meaning no checking.
|
2470 |
("value123" would still be). Numbers are often quite interesting
|
|
|
2471 |
to search for, and this should probably not be set except for
|
|
|
2472 |
special situations, ie, scientific documents with huge amounts of
|
|
|
2473 |
numbers in them. This can only be set for a whole index, not for a
|
|
|
2474 |
subtree.
|
2421 |
|
2475 |
|
2422 |
mboxcachedir
|
2476 |
nocjk
|
2423 |
|
2477 |
|
2424 |
The directory where mbox message offsets cache files are held.
|
2478 |
If this set to true, specific east asian (Chinese Korean Japanese)
|
2425 |
This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
|
2479 |
characters/word splitting is turned off. This will save a small
|
2426 |
to share a directory between different configurations.
|
2480 |
amount of cpu if you have no CJK documents. If your document base
|
|
|
2481 |
does include such text but you are not interested in searching it,
|
|
|
2482 |
setting nocjk may be a significant time and space saver.
|
2427 |
|
2483 |
|
2428 |
mboxcacheminmbs
|
2484 |
cjkngramlen
|
2429 |
|
2485 |
|
2430 |
The minimum mbox file size over which we cache the offsets. There
|
2486 |
This lets you adjust the size of n-grams used for indexing CJK
|
2431 |
is really no sense in caching offsets for small files. The default
|
2487 |
text. The default value of 2 is probably appropriate in most
|
2432 |
is 5 MB.
|
2488 |
cases. A value of 3 would allow more precision and efficiency on
|
2433 |
|
2489 |
longer words, but the index will be approximately twice as large.
|
2434 |
webcachedir
|
|
|
2435 |
|
|
|
2436 |
This is only used by the Beagle web browser plugin indexing code,
|
|
|
2437 |
and defines where the cache for visited pages will live. Default:
|
|
|
2438 |
$RECOLL_CONFDIR/webcache
|
|
|
2439 |
|
|
|
2440 |
webcachemaxmbs
|
|
|
2441 |
|
|
|
2442 |
This is only used by the Beagle web browser plugin indexing code,
|
|
|
2443 |
and defines the maximum size for the web page cache. Default: 40
|
|
|
2444 |
MB.
|
|
|
2445 |
|
|
|
2446 |
idxflushmb
|
|
|
2447 |
|
|
|
2448 |
Threshold (megabytes of new text data) where we flush from memory
|
|
|
2449 |
to disk index. Setting this can help control memory usage. A value
|
|
|
2450 |
of 0 means no explicit flushing, letting Xapian use its own
|
|
|
2451 |
default, which is flushing every 10000 documents (memory usage
|
|
|
2452 |
depends on average document size). The default value is 10.
|
|
|
2453 |
|
|
|
2454 |
Miscellani:
|
|
|
2455 |
|
|
|
2456 |
loglevel,daemloglevel
|
|
|
2457 |
|
|
|
2458 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
|
|
2459 |
quite a lot of debug/information messages. 2 only lists errors.
|
|
|
2460 |
The daemversion is specific to the indexing monitor daemon.
|
|
|
2461 |
|
|
|
2462 |
logfilename, daemlogfilename
|
|
|
2463 |
|
|
|
2464 |
Where the messages should go. 'stderr' can be used as a special
|
|
|
2465 |
value, and is the default. The daemversion is specific to the
|
|
|
2466 |
indexing monitor daemon.
|
|
|
2467 |
|
2490 |
|
2468 |
indexstemminglanguages
|
2491 |
indexstemminglanguages
|
2469 |
|
2492 |
|
2470 |
A list of languages for which the stem expansion databases will be
|
2493 |
A list of languages for which the stem expansion databases will be
|
2471 |
built. See recollindex(1) or use the recollindex -l command for
|
2494 |
built. See recollindex(1) or use the recollindex -l command for
|
|
... |
|
... |
2480 |
character set definition (ie: plain text files). This can be
|
2503 |
character set definition (ie: plain text files). This can be
|
2481 |
redefined for any sub-directory. If it is not set at all, the
|
2504 |
redefined for any sub-directory. If it is not set at all, the
|
2482 |
character set used is the one defined by the nls environment
|
2505 |
character set used is the one defined by the nls environment
|
2483 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
2506 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
2484 |
|
2507 |
|
2485 |
filtermaxseconds
|
|
|
2486 |
|
|
|
2487 |
Maximum filter execution time, after which it is aborted. Some
|
|
|
2488 |
postscript programs just loop...
|
|
|
2489 |
|
|
|
2490 |
maildefcharset
|
2508 |
maildefcharset
|
2491 |
|
2509 |
|
2492 |
This can be used to define the default character set specifically
|
2510 |
This can be used to define the default character set specifically
|
2493 |
for mail messages which don't specify it. This is mainly useful
|
2511 |
for mail messages which don't specify it. This is mainly useful
|
2494 |
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
2512 |
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
|
... |
|
... |
2496 |
localfields
|
2514 |
localfields
|
2497 |
|
2515 |
|
2498 |
This allows setting fields for all documents under a given
|
2516 |
This allows setting fields for all documents under a given
|
2499 |
directory. Typical usage would be to set an "rclaptg" field, to be
|
2517 |
directory. Typical usage would be to set an "rclaptg" field, to be
|
2500 |
used in mimeview to select a specific viewer. If several fields
|
2518 |
used in mimeview to select a specific viewer. If several fields
|
2501 |
are to be set, they should be separated with a ':' character
|
2519 |
are to be set, they should be separated with a colon (':')
|
2502 |
(which there is currently no way to escape). Ie: localfields=
|
2520 |
character (which there is currently no way to escape). Ie:
|
2503 |
rclaptg=gnus:other = val, then select specifier viewer with
|
2521 |
localfields= rclaptg=gnus:other = val, then select specifier
|
2504 |
mimetype|tag=... in mimeview.
|
2522 |
viewer with mimetype|tag=... in mimeview.
|
|
|
2523 |
|
|
|
2524 |
----------------------------------------------------------------------
|
|
|
2525 |
|
|
|
2526 |
7.4.1.3. Parameters affecting where and how we store things:
|
|
|
2527 |
|
|
|
2528 |
dbdir
|
|
|
2529 |
|
|
|
2530 |
The name of the Xapian data directory. It will be created if
|
|
|
2531 |
needed when the index is initialized. If this is not an absolute
|
|
|
2532 |
path, it will be interpreted relative to the configuration
|
|
|
2533 |
directory. The value can have embedded spaces but starting or
|
|
|
2534 |
trailing spaces will be trimmed. You cannot use quotes here.
|
|
|
2535 |
|
|
|
2536 |
maxfsoccuppc
|
|
|
2537 |
|
|
|
2538 |
Maximum file system occupation before we stop indexing. The value
|
|
|
2539 |
is a percentage, corresponding to what the "Capacity" df output
|
|
|
2540 |
column shows. The default value is 0, meaning no checking.
|
|
|
2541 |
|
|
|
2542 |
mboxcachedir
|
|
|
2543 |
|
|
|
2544 |
The directory where mbox message offsets cache files are held.
|
|
|
2545 |
This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
|
|
|
2546 |
to share a directory between different configurations.
|
|
|
2547 |
|
|
|
2548 |
mboxcacheminmbs
|
|
|
2549 |
|
|
|
2550 |
The minimum mbox file size over which we cache the offsets. There
|
|
|
2551 |
is really no sense in caching offsets for small files. The default
|
|
|
2552 |
is 5 MB.
|
|
|
2553 |
|
|
|
2554 |
webcachedir
|
|
|
2555 |
|
|
|
2556 |
This is only used by the Beagle web browser plugin indexing code,
|
|
|
2557 |
and defines where the cache for visited pages will live. Default:
|
|
|
2558 |
$RECOLL_CONFDIR/webcache
|
|
|
2559 |
|
|
|
2560 |
webcachemaxmbs
|
|
|
2561 |
|
|
|
2562 |
This is only used by the Beagle web browser plugin indexing code,
|
|
|
2563 |
and defines the maximum size for the web page cache. Default: 40
|
|
|
2564 |
MB.
|
|
|
2565 |
|
|
|
2566 |
idxflushmb
|
|
|
2567 |
|
|
|
2568 |
Threshold (megabytes of new text data) where we flush from memory
|
|
|
2569 |
to disk index. Setting this can help control memory usage. A value
|
|
|
2570 |
of 0 means no explicit flushing, letting Xapian use its own
|
|
|
2571 |
default, which is flushing every 10000 documents (memory usage
|
|
|
2572 |
depends on average document size). The default value is 10.
|
|
|
2573 |
|
|
|
2574 |
----------------------------------------------------------------------
|
|
|
2575 |
|
|
|
2576 |
7.4.1.4. Miscellaneous parameters:
|
|
|
2577 |
|
|
|
2578 |
loglevel,daemloglevel
|
|
|
2579 |
|
|
|
2580 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
|
|
2581 |
quite a lot of debug/information messages. 2 only lists errors.
|
|
|
2582 |
The daemversion is specific to the indexing monitor daemon.
|
|
|
2583 |
|
|
|
2584 |
logfilename, daemlogfilename
|
|
|
2585 |
|
|
|
2586 |
Where the messages should go. 'stderr' can be used as a special
|
|
|
2587 |
value, and is the default. The daemversion is specific to the
|
|
|
2588 |
indexing monitor daemon.
|
|
|
2589 |
|
|
|
2590 |
filtermaxseconds
|
|
|
2591 |
|
|
|
2592 |
Maximum filter execution time, after which it is aborted. Some
|
|
|
2593 |
postscript programs just loop...
|
2505 |
|
2594 |
|
2506 |
filtersdir
|
2595 |
filtersdir
|
2507 |
|
2596 |
|
2508 |
A directory to search for the external filter scripts used to
|
2597 |
A directory to search for the external filter scripts used to
|
2509 |
index some types of files. The value should not be changed, except
|
2598 |
index some types of files. The value should not be changed, except
|
|
... |
|
... |
2540 |
|
2629 |
|
2541 |
If this is set, the aspell dictionary generation is turned off.
|
2630 |
If this is set, the aspell dictionary generation is turned off.
|
2542 |
Useful for cases where you don't need the functionality or when it
|
2631 |
Useful for cases where you don't need the functionality or when it
|
2543 |
is unusable because aspell crashes during dictionary generation.
|
2632 |
is unusable because aspell crashes during dictionary generation.
|
2544 |
|
2633 |
|
2545 |
nocjk
|
|
|
2546 |
|
|
|
2547 |
If this set to true, specific east asian (Chinese Korean Japanese)
|
|
|
2548 |
characters/word splitting is turned off. This will save a small
|
|
|
2549 |
amount of cpu if you have no CJK documents. If your document base
|
|
|
2550 |
does include such text but you are not interested in searching it,
|
|
|
2551 |
setting nocjk may be a significant time and space saver.
|
|
|
2552 |
|
|
|
2553 |
cjkngramlen
|
|
|
2554 |
|
|
|
2555 |
This lets you adjust the size of n-grams used for indexing CJK
|
|
|
2556 |
text. The default value of 2 is probably appropriate in most
|
|
|
2557 |
cases. A value of 3 would allow more precision and efficiency on
|
|
|
2558 |
longer words, but the index will be approximately twice as large.
|
|
|
2559 |
|
|
|
2560 |
guesscharset
|
2634 |
guesscharset
|
2561 |
|
2635 |
|
2562 |
Decide if we try to guess the character set of files if no
|
2636 |
Decide if we try to guess the character set of files if no
|
2563 |
internal value is available (ie: for plain text files). This does
|
2637 |
internal value is available (ie: for plain text files). This does
|
2564 |
not work well in general, and should probably not be used.
|
2638 |
not work well in general, and should probably not be used.
|
2565 |
|
2639 |
|
2566 |
----------------------------------------------------------------------
|
2640 |
----------------------------------------------------------------------
|
2567 |
|
2641 |
|
|
|
2642 |
7.4.2. The fields file
|
|
|
2643 |
|
|
|
2644 |
This file contains information about dynamic fields handling in Recoll.
|
|
|
2645 |
Some very basic fields have hard-wired behaviour, and, mostly, you should
|
|
|
2646 |
not change the original data inside the fields file. But you can create
|
|
|
2647 |
custom fields fitting your data and handle them just like they were native
|
|
|
2648 |
ones.
|
|
|
2649 |
|
|
|
2650 |
The fields file has several sections, which each define an aspect of
|
|
|
2651 |
fields processing. Quite often, you'll have to modify several sections to
|
|
|
2652 |
obtain the desired behaviour.
|
|
|
2653 |
|
|
|
2654 |
We will only give a short description here, you should refer to the
|
|
|
2655 |
comments inside the file for more detailed information.
|
|
|
2656 |
|
|
|
2657 |
Field names should be lowercase alphabetic ASCII.
|
|
|
2658 |
|
|
|
2659 |
[prefixes]
|
|
|
2660 |
|
|
|
2661 |
A field becomes indexed (searchable) by having a prefix defined in
|
|
|
2662 |
this section.
|
|
|
2663 |
|
|
|
2664 |
[stored]
|
|
|
2665 |
|
|
|
2666 |
A field becomes stored (displayable inside results) by having its
|
|
|
2667 |
name listed in this section (typically with an empty value).
|
|
|
2668 |
|
|
|
2669 |
[aliases]
|
|
|
2670 |
|
|
|
2671 |
This section defines lists of synonyms for the canonical names
|
|
|
2672 |
used inside the [prefixes] and [stored] sections
|
|
|
2673 |
|
|
|
2674 |
filter-specific sections
|
|
|
2675 |
|
|
|
2676 |
Some filters may need specific configuration for handling fields.
|
|
|
2677 |
Only the mail message filter currently has such a section (named
|
|
|
2678 |
[mail]). It allows indexing arbitrary mail headers in addition to
|
|
|
2679 |
the ones indexed by default. Other such sections may appear in the
|
|
|
2680 |
future.
|
|
|
2681 |
|
|
|
2682 |
Here follows a small example of a personal fields file. This would extract
|
|
|
2683 |
a specific mail header and use it as a searchable field, with data
|
|
|
2684 |
displayable inside result lists. (Side note: as the mail filter does no
|
|
|
2685 |
decoding on the values, only plain ascii headers can be indexed, and only
|
|
|
2686 |
the first occurrence will be used for headers that occur several times).
|
|
|
2687 |
|
|
|
2688 |
[prefixes]
|
|
|
2689 |
# Index mailmytag contents (with the given prefix)
|
|
|
2690 |
mailmytag = XMTAG
|
|
|
2691 |
|
|
|
2692 |
[stored]
|
|
|
2693 |
# Store mailmytag inside the document data record (so that it can be
|
|
|
2694 |
# displayed - as %(mailmytag) - in result lists).
|
|
|
2695 |
mailmytag =
|
|
|
2696 |
|
|
|
2697 |
[mail]
|
|
|
2698 |
# Extract the X-My-Tag mail header, and use it internally with the
|
|
|
2699 |
# mailmytag field name
|
|
|
2700 |
x-my-tag = mailmytag
|
|
|
2701 |
|
|
|
2702 |
----------------------------------------------------------------------
|
|
|
2703 |
|
2568 |
7.4.2. The mimemap file
|
2704 |
7.4.3. The mimemap file
|
2569 |
|
2705 |
|
2570 |
mimemap specifies the file name extension to mime type mappings.
|
2706 |
mimemap specifies the file name extension to mime type mappings.
|
2571 |
|
2707 |
|
2572 |
For file names without an extension, or with an unknown one, the system's
|
2708 |
For file names without an extension, or with an unknown one, the system's
|
2573 |
file -i command will be executed to determine the mime type (this can be
|
2709 |
file -i command will be executed to determine the mime type (this can be
|
|
... |
|
... |
2589 |
given Recoll version. Having it there avoids cluttering the more
|
2725 |
given Recoll version. Having it there avoids cluttering the more
|
2590 |
user-oriented and locally customized skippedNames.
|
2726 |
user-oriented and locally customized skippedNames.
|
2591 |
|
2727 |
|
2592 |
----------------------------------------------------------------------
|
2728 |
----------------------------------------------------------------------
|
2593 |
|
2729 |
|
2594 |
7.4.3. The mimeconf file
|
2730 |
7.4.4. The mimeconf file
|
2595 |
|
2731 |
|
2596 |
mimeconf specifies how the different mime types are handled for indexing,
|
2732 |
mimeconf specifies how the different mime types are handled for indexing,
|
2597 |
and which icons are displayed in the recoll result lists.
|
2733 |
and which icons are displayed in the recoll result lists.
|
2598 |
|
2734 |
|
2599 |
Changing the parameters in the [index] section is probably not a good idea
|
2735 |
Changing the parameters in the [index] section is probably not a good idea
|
|
... |
|
... |
2603 |
recoll in the result lists (the values are the basenames of the png images
|
2739 |
recoll in the result lists (the values are the basenames of the png images
|
2604 |
inside the iconsdir directory (specified in recoll.conf).
|
2740 |
inside the iconsdir directory (specified in recoll.conf).
|
2605 |
|
2741 |
|
2606 |
----------------------------------------------------------------------
|
2742 |
----------------------------------------------------------------------
|
2607 |
|
2743 |
|
2608 |
7.4.4. The mimeview file
|
2744 |
7.4.5. The mimeview file
|
2609 |
|
2745 |
|
2610 |
mimeview specifies which programs are started when you click on an Edit
|
2746 |
mimeview specifies which programs are started when you click on an Edit
|
2611 |
link in a result list. Ie: HTML is normally displayed using firefox, but
|
2747 |
link in a result list. Ie: HTML is normally displayed using firefox, but
|
2612 |
you may prefer Konqueror, your openoffice.org program might be named
|
2748 |
you may prefer Konqueror, your openoffice.org program might be named
|
2613 |
oofice instead of openoffice etc.
|
2749 |
oofice instead of openoffice etc.
|
|
... |
|
... |
2631 |
user preferences, all mimeview entries will be ignored except the one
|
2767 |
user preferences, all mimeview entries will be ignored except the one
|
2632 |
labelled application/x-all (which is set to use xdg-open by default).
|
2768 |
labelled application/x-all (which is set to use xdg-open by default).
|
2633 |
|
2769 |
|
2634 |
----------------------------------------------------------------------
|
2770 |
----------------------------------------------------------------------
|
2635 |
|
2771 |
|
2636 |
7.4.5. Examples of configuration adjustments
|
2772 |
7.4.6. Examples of configuration adjustments
|
2637 |
|
2773 |
|
2638 |
7.4.5.1. Adding an external viewer for an non-indexed type
|
2774 |
7.4.6.1. Adding an external viewer for an non-indexed type
|
2639 |
|
2775 |
|
2640 |
Imagine that you have some kind of file which does not have indexable
|
2776 |
Imagine that you have some kind of file which does not have indexable
|
2641 |
content, but for which you would like to have a functional Edit link in
|
2777 |
content, but for which you would like to have a functional Edit link in
|
2642 |
the result list (when found by file name). The file names end in .blob and
|
2778 |
the result list (when found by file name). The file names end in .blob and
|
2643 |
can be displayed by application blobviewer.
|
2779 |
can be displayed by application blobviewer.
|
|
... |
|
... |
2665 |
configuration, which you do not need to alter. mimeview can also be
|
2801 |
configuration, which you do not need to alter. mimeview can also be
|
2666 |
modified from the Gui.
|
2802 |
modified from the Gui.
|
2667 |
|
2803 |
|
2668 |
----------------------------------------------------------------------
|
2804 |
----------------------------------------------------------------------
|
2669 |
|
2805 |
|
2670 |
7.4.5.2. Adding indexing support for a new file type
|
2806 |
7.4.6.2. Adding indexing support for a new file type
|
2671 |
|
2807 |
|
2672 |
Let us now imagine that the above .blob files actually contain indexable
|
2808 |
Let us now imagine that the above .blob files actually contain indexable
|
2673 |
text and that you know how to extract it with a command line program.
|
2809 |
text and that you know how to extract it with a command line program.
|
2674 |
Getting Recoll to index the files is easy. You need to perform the above
|
2810 |
Getting Recoll to index the files is easy. You need to perform the above
|
2675 |
alteration, and also to add data to the mimeconf file (typically in
|
2811 |
alteration, and also to add data to the mimeconf file (typically in
|