|
a/src/README |
|
b/src/README |
|
... |
|
... |
32 |
|
32 |
|
33 |
2.1. Introduction
|
33 |
2.1. Introduction
|
34 |
|
34 |
|
35 |
2.2. Index storage
|
35 |
2.2. Index storage
|
36 |
|
36 |
|
37 |
2.2.1. Index formats
|
37 |
2.2.1. Xapian index formats
|
38 |
|
38 |
|
39 |
2.2.2. Security aspects
|
39 |
2.2.2. Security aspects
|
40 |
|
40 |
|
41 |
2.3. The indexing configuration
|
41 |
2.3. Indexing configuration
|
|
|
42 |
|
|
|
43 |
2.3.1. The indexing configuration GUI
|
42 |
|
44 |
|
43 |
2.4. Periodic indexing
|
45 |
2.4. Periodic indexing
|
44 |
|
46 |
|
45 |
2.4.1. Starting indexing
|
47 |
2.4.1. Starting indexing
|
46 |
|
48 |
|
|
... |
|
... |
104 |
|
106 |
|
105 |
4.4.4. The mimeview file
|
107 |
4.4.4. The mimeview file
|
106 |
|
108 |
|
107 |
4.4.5. Examples of configuration adjustments
|
109 |
4.4.5. Examples of configuration adjustments
|
108 |
|
110 |
|
|
|
111 |
4.5. The KDE Kicker Recoll applet
|
|
|
112 |
|
109 |
4.5. Extending Recoll
|
113 |
4.6. Extending Recoll
|
110 |
|
114 |
|
111 |
4.5.1. Writing a document filter
|
115 |
4.6.1. Writing a document filter
|
112 |
|
116 |
|
113 |
----------------------------------------------------------------------
|
117 |
----------------------------------------------------------------------
|
114 |
|
118 |
|
115 |
Chapter 1. Introduction
|
119 |
Chapter 1. Introduction
|
116 |
|
120 |
|
|
... |
|
... |
313 |
The index data directory (xapiandb) only contains data that can be
|
317 |
The index data directory (xapiandb) only contains data that can be
|
314 |
completely rebuilt by an index run, and it can always be destroyed safely.
|
318 |
completely rebuilt by an index run, and it can always be destroyed safely.
|
315 |
|
319 |
|
316 |
----------------------------------------------------------------------
|
320 |
----------------------------------------------------------------------
|
317 |
|
321 |
|
318 |
2.2.1. Index formats
|
322 |
2.2.1. Xapian index formats
|
|
|
323 |
|
|
|
324 |
If your first installation of Recoll was 1.9.0 or more recent, you can
|
|
|
325 |
skip this section.
|
319 |
|
326 |
|
320 |
Xapian has had two possible index formats for quite some time. The "old"
|
327 |
Xapian has had two possible index formats for quite some time. The "old"
|
321 |
one named Quartz, and the new one named Flint. Xapian 0.9 used Quartz by
|
328 |
one named Quartz, and the new one named Flint. Xapian 0.9 used Quartz by
|
322 |
default, but could use Flint if a specific environment variable
|
329 |
default, but could use Flint if a specific environment variable
|
323 |
(XAPIAN_PREFER_FLINT) was set. Xapian 1.0 still supports Quartz but will
|
330 |
(XAPIAN_PREFER_FLINT) was set. Xapian 1.0 still supports Quartz but will
|
|
... |
|
... |
352 |
mode of 0700 (access by owner only). As the index data directory is by
|
359 |
mode of 0700 (access by owner only). As the index data directory is by
|
353 |
default a sub-directory of the configuration directory, this should result
|
360 |
default a sub-directory of the configuration directory, this should result
|
354 |
in appropriate protection.
|
361 |
in appropriate protection.
|
355 |
|
362 |
|
356 |
If you use another setup, you should think of the kind of protection you
|
363 |
If you use another setup, you should think of the kind of protection you
|
357 |
need for your index, and set the directory and files access modes
|
364 |
need for your index, set the directory and files access modes
|
358 |
appropriately.
|
365 |
appropriately, and also maybe adjust the umask used during index updates.
|
359 |
|
366 |
|
360 |
----------------------------------------------------------------------
|
367 |
----------------------------------------------------------------------
|
361 |
|
368 |
|
362 |
2.3. The indexing configuration
|
369 |
2.3. Indexing configuration
|
363 |
|
370 |
|
364 |
You can control which areas of the file system are indexed, and how files
|
371 |
Variables set inside the Recoll configuration files control which areas of
|
365 |
are processed, by setting variables inside the Recoll configuration files.
|
372 |
the file system are indexed, and how files are processed. These variables
|
|
|
373 |
can be set either by editing the text files or using the dialogs in the
|
|
|
374 |
recoll GUI.
|
366 |
|
375 |
|
367 |
You can also use multiple indexes defined by separate configurations,
|
376 |
You can also use multiple indexes defined by separate configurations,
|
368 |
typically to separate personal and shared indexes, or to take advantage of
|
377 |
typically to separate personal and shared indexes, or to take advantage of
|
369 |
the organization of your data to improve search precision.
|
378 |
the organization of your data to improve search precision.
|
370 |
|
379 |
|
|
... |
|
... |
381 |
topdirs, which determines what subtrees get indexed.
|
390 |
topdirs, which determines what subtrees get indexed.
|
382 |
|
391 |
|
383 |
The applications needed to index file types other than text, HTML or email
|
392 |
The applications needed to index file types other than text, HTML or email
|
384 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
393 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
385 |
section
|
394 |
section
|
|
|
395 |
|
|
|
396 |
----------------------------------------------------------------------
|
|
|
397 |
|
|
|
398 |
2.3.1. The indexing configuration GUI
|
|
|
399 |
|
|
|
400 |
As of Recoll 1.10, most parameters for a given indexing configuration can
|
|
|
401 |
be set from a recoll GUI running on this configuration (either as default,
|
|
|
402 |
or by setting RECOLL_CONFDIR or the -c option.)
|
|
|
403 |
|
|
|
404 |
The interface is started from the Preferences menu. It has two main
|
|
|
405 |
panels. The first panel allows setting global variables, like the list of
|
|
|
406 |
top directories or the list of skipped paths. The second panel allows
|
|
|
407 |
setting variables that can be redefined for subdirectories. This second
|
|
|
408 |
panel has an initially empty list of customisation directories, to which
|
|
|
409 |
you can add. The variables are then set for the currently selected
|
|
|
410 |
directory (or at the top level if the empty line is selected).
|
|
|
411 |
|
|
|
412 |
The meaning for most entries in the interface is self-evident and
|
|
|
413 |
documented by a ToolTip popup on the text label. For more detail, you will
|
|
|
414 |
need to refer to the configuration section of this guide.
|
|
|
415 |
|
|
|
416 |
The configuration tool normally respects the comments and most of the
|
|
|
417 |
formatting inside the configuration file, so that it is quite possible to
|
|
|
418 |
use it on hand-edited files, which you might nevertheless want to backup
|
|
|
419 |
first...
|
386 |
|
420 |
|
387 |
----------------------------------------------------------------------
|
421 |
----------------------------------------------------------------------
|
388 |
|
422 |
|
389 |
2.4. Periodic indexing
|
423 |
2.4. Periodic indexing
|
390 |
|
424 |
|
|
... |
|
... |
715 |
|
749 |
|
716 |
There are two other elements which may be specified through the field
|
750 |
There are two other elements which may be specified through the field
|
717 |
syntax, but are somewhat special:
|
751 |
syntax, but are somewhat special:
|
718 |
|
752 |
|
719 |
* ext for specifying the file name extension (Ex: ext:html)
|
753 |
* ext for specifying the file name extension (Ex: ext:html)
|
|
|
754 |
|
|
|
755 |
* dir for specifying the file location (Ex: dir:/home/me/somedir).
|
|
|
756 |
Please note that this is quite inefficient, that it may produce very
|
|
|
757 |
slow searches, and that it may be worth in some cases to set up
|
|
|
758 |
separate databases instead.
|
720 |
|
759 |
|
721 |
* mime for specifying the mime type. This one is quite special because
|
760 |
* mime for specifying the mime type. This one is quite special because
|
722 |
you can specify several values which will be OR'ed (the normal default
|
761 |
you can specify several values which will be OR'ed (the normal default
|
723 |
for the language is AND). Ex: mime:text/plain mime:text/html.
|
762 |
for the language is AND). Ex: mime:text/plain mime:text/html.
|
724 |
Specifying an explicit boolean operator or negation (-) before a mime
|
763 |
Specifying an explicit boolean operator or negation (-) before a mime
|
|
... |
|
... |
1201 |
|
1240 |
|
1202 |
* Wordperfect files: libwpd.
|
1241 |
* Wordperfect files: libwpd.
|
1203 |
|
1242 |
|
1204 |
* RTF: unrtf
|
1243 |
* RTF: unrtf
|
1205 |
|
1244 |
|
|
|
1245 |
* TeX: Recoll uses the untex program. Your distribution may have a
|
|
|
1246 |
package for it. If it doesn't, there is a copy of the source on the
|
|
|
1247 |
Recoll web site, because the program has no obvious home. The filter
|
|
|
1248 |
can also work with detex and will use it if it is installed.
|
|
|
1249 |
|
1206 |
* dvi: dvips
|
1250 |
* dvi: dvips
|
1207 |
|
1251 |
|
1208 |
* djvu: DjVuLibre
|
1252 |
* djvu: DjVuLibre
|
1209 |
|
1253 |
|
1210 |
* MP3: Recoll will use the id3info command from the id3lib package to
|
1254 |
* MP3: Recoll will use the id3info command from the id3lib package to
|
|
... |
|
... |
1497 |
Decide if we use the file -i system command as a final step for
|
1541 |
Decide if we use the file -i system command as a final step for
|
1498 |
determining the mime type for a file (the main procedure uses
|
1542 |
determining the mime type for a file (the main procedure uses
|
1499 |
suffix associations as defined in the mimemap file). This can be
|
1543 |
suffix associations as defined in the mimemap file). This can be
|
1500 |
useful for files with suffix-less names, but it will also cause
|
1544 |
useful for files with suffix-less names, but it will also cause
|
1501 |
the indexing of many bogus "text" files.
|
1545 |
the indexing of many bogus "text" files.
|
|
|
1546 |
|
|
|
1547 |
indexedmimetypes
|
|
|
1548 |
|
|
|
1549 |
Recoll normally indexes any file which it knows how to read. This
|
|
|
1550 |
list lets you restrict the indexed mime types to what you specify.
|
|
|
1551 |
If the variable is unspecified or the list empty (the default),
|
|
|
1552 |
all supported types are processed.
|
1502 |
|
1553 |
|
1503 |
indexallfilenames
|
1554 |
indexallfilenames
|
1504 |
|
1555 |
|
1505 |
Recoll indexes file names in a special section of the database to
|
1556 |
Recoll indexes file names in a special section of the database to
|
1506 |
allow specific file names searches using wild cards. This
|
1557 |
allow specific file names searches using wild cards. This
|
|
... |
|
... |
1534 |
|
1585 |
|
1535 |
If this is set, the aspell dictionary generation is turned off.
|
1586 |
If this is set, the aspell dictionary generation is turned off.
|
1536 |
Useful for cases where you don't need the functionality or when it
|
1587 |
Useful for cases where you don't need the functionality or when it
|
1537 |
is unusable because aspell crashes during dictionary generation.
|
1588 |
is unusable because aspell crashes during dictionary generation.
|
1538 |
|
1589 |
|
|
|
1590 |
nocjk
|
|
|
1591 |
|
|
|
1592 |
If this set to true, specific east asian (Chinese Korean Japanese)
|
|
|
1593 |
characters/word splitting is turned off. This will save a small
|
|
|
1594 |
amount of cpu if you have no CJK documents. If your document base
|
|
|
1595 |
does include such text but you are not interested in searching it,
|
|
|
1596 |
setting nocjk may be a significant time and space saver.
|
|
|
1597 |
|
|
|
1598 |
cjkngramlen
|
|
|
1599 |
|
|
|
1600 |
This lets you adjust the size of n-grams used for indexing CJK
|
|
|
1601 |
text. The default value of 2 is probably appropriate in most
|
|
|
1602 |
cases. A value of 3 would allow more precision and efficiency on
|
|
|
1603 |
longer words, but the index will be approximately twice as large.
|
|
|
1604 |
|
1539 |
----------------------------------------------------------------------
|
1605 |
----------------------------------------------------------------------
|
1540 |
|
1606 |
|
1541 |
4.4.2. The mimemap file
|
1607 |
4.4.2. The mimemap file
|
1542 |
|
1608 |
|
1543 |
mimemap specifies the file name extension to mime type mappings.
|
1609 |
mimemap specifies the file name extension to mime type mappings.
|
|
... |
|
... |
1666 |
You can find more details about writing a Recoll filter in the section
|
1732 |
You can find more details about writing a Recoll filter in the section
|
1667 |
about writing filters
|
1733 |
about writing filters
|
1668 |
|
1734 |
|
1669 |
----------------------------------------------------------------------
|
1735 |
----------------------------------------------------------------------
|
1670 |
|
1736 |
|
|
|
1737 |
4.5. The KDE Kicker Recoll applet
|
|
|
1738 |
|
|
|
1739 |
The Recoll source tree contains the source code to the recoll_applet, a
|
|
|
1740 |
small application derived from the find_applet. This can be used to add a
|
|
|
1741 |
small Recoll launcher to the KDE panel.
|
|
|
1742 |
|
|
|
1743 |
The applet is not automatically built with the main Recoll programs. To
|
|
|
1744 |
build it, you need to unpack the Recoll source code, then go to the
|
|
|
1745 |
kde/recoll_applet/ directory, and type the usual configure;make;make
|
|
|
1746 |
install.
|
|
|
1747 |
|
|
|
1748 |
You can then add the applet to the panel by right-clicking the panel and
|
|
|
1749 |
choosing the Add applet entry.
|
|
|
1750 |
|
|
|
1751 |
The recoll_applet has a small text window where you can type a Recoll
|
|
|
1752 |
query (in query language form), and an icon which can be used to restrict
|
|
|
1753 |
the search to certain types of files.
|
|
|
1754 |
|
|
|
1755 |
----------------------------------------------------------------------
|
|
|
1756 |
|
1671 |
4.5. Extending Recoll
|
1757 |
4.6. Extending Recoll
|
1672 |
|
1758 |
|
1673 |
4.5.1. Writing a document filter
|
1759 |
4.6.1. Writing a document filter
|
1674 |
|
1760 |
|
1675 |
Recoll filters are executable programs which translate from a specific
|
1761 |
Recoll filters are executable programs which translate from a specific
|
1676 |
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
1762 |
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
1677 |
format, which was chosen to be HTML.
|
1763 |
format, which was chosen to be HTML.
|
1678 |
|
1764 |
|