|
a/src/README |
|
b/src/README |
|
... |
|
... |
12 |
|
12 |
|
13 |
This document introduces full text search notions and describes the
|
13 |
This document introduces full text search notions and describes the
|
14 |
installation and use of the Recoll application. It currently describes
|
14 |
installation and use of the Recoll application. It currently describes
|
15 |
Recoll 1.18.
|
15 |
Recoll 1.18.
|
16 |
|
16 |
|
17 |
[ Split HTML / Single HTML ]
|
|
|
18 |
|
|
|
19 |
----------------------------------------------------------------------
|
17 |
----------------------------------------------------------------------
|
20 |
|
18 |
|
21 |
Table of Contents
|
19 |
Table of Contents
|
22 |
|
20 |
|
23 |
1. Introduction
|
21 |
1. Introduction
|
|
... |
|
... |
52 |
|
50 |
|
53 |
2.3.2. Index case and diacritics sensitivity
|
51 |
2.3.2. Index case and diacritics sensitivity
|
54 |
|
52 |
|
55 |
2.3.3. The index configuration GUI
|
53 |
2.3.3. The index configuration GUI
|
56 |
|
54 |
|
57 |
2.4. Using Beagle WEB browser plugins
|
55 |
2.4. Index WEB visited page history
|
58 |
|
56 |
|
59 |
2.5. Periodic indexing
|
57 |
2.5. Periodic indexing
|
60 |
|
58 |
|
61 |
2.5.1. Running indexing
|
59 |
2.5.1. Running indexing
|
62 |
|
60 |
|
|
... |
|
... |
75 |
|
73 |
|
76 |
3.1.2. The default result list
|
74 |
3.1.2. The default result list
|
77 |
|
75 |
|
78 |
3.1.3. The result table
|
76 |
3.1.3. The result table
|
79 |
|
77 |
|
|
|
78 |
3.1.4. Displaying thumbnails
|
|
|
79 |
|
80 |
3.1.4. The preview window
|
80 |
3.1.5. The preview window
|
81 |
|
81 |
|
82 |
3.1.5. Complex/advanced search
|
82 |
3.1.6. Complex/advanced search
|
83 |
|
83 |
|
84 |
3.1.6. The term explorer tool
|
84 |
3.1.7. The term explorer tool
|
85 |
|
85 |
|
86 |
3.1.7. Multiple indexes
|
86 |
3.1.8. Multiple indexes
|
87 |
|
87 |
|
88 |
3.1.8. Document history
|
88 |
3.1.9. Document history
|
89 |
|
89 |
|
90 |
3.1.9. Sorting search results and collapsing
|
90 |
3.1.10. Sorting search results and collapsing
|
91 |
duplicates
|
91 |
duplicates
|
92 |
|
92 |
|
93 |
3.1.10. Search tips, shortcuts
|
93 |
3.1.11. Search tips, shortcuts
|
94 |
|
94 |
|
95 |
3.1.11. Customizing the search interface
|
95 |
3.1.12. Customizing the search interface
|
96 |
|
96 |
|
97 |
3.2. Searching with the KDE KIO slave
|
97 |
3.2. Searching with the KDE KIO slave
|
98 |
|
98 |
|
99 |
3.2.1. What's this
|
99 |
3.2.1. What's this
|
100 |
|
100 |
|
|
... |
|
... |
124 |
|
124 |
|
125 |
4.1. Writing a document filter
|
125 |
4.1. Writing a document filter
|
126 |
|
126 |
|
127 |
4.1.1. Simple filters
|
127 |
4.1.1. Simple filters
|
128 |
|
128 |
|
|
|
129 |
4.1.2. "Multiple" filters
|
|
|
130 |
|
129 |
4.1.2. Telling Recoll about the filter
|
131 |
4.1.3. Telling Recoll about the filter
|
130 |
|
132 |
|
131 |
4.1.3. Filter HTML output
|
133 |
4.1.4. Filter HTML output
|
132 |
|
134 |
|
133 |
4.1.4. Page numbers
|
135 |
4.1.5. Page numbers
|
134 |
|
136 |
|
135 |
4.2. Field data processing
|
137 |
4.2. Field data processing
|
136 |
|
138 |
|
137 |
4.3. API
|
139 |
4.3. API
|
138 |
|
140 |
|
|
... |
|
... |
170 |
|
172 |
|
171 |
5.4.5. The mimeview file
|
173 |
5.4.5. The mimeview file
|
172 |
|
174 |
|
173 |
5.4.6. Examples of configuration adjustments
|
175 |
5.4.6. Examples of configuration adjustments
|
174 |
|
176 |
|
175 |
----------------------------------------------------------------------
|
177 |
Chapter 1. Introduction
|
176 |
|
|
|
177 |
Chapter 1. Introduction
|
|
|
178 |
|
178 |
|
179 |
1.1. Giving it a try
|
179 |
1.1. Giving it a try
|
180 |
|
180 |
|
181 |
If you do not like reading manuals (who does?) and would like to give
|
181 |
If you do not like reading manuals (who does?) and would like to give
|
182 |
Recoll a try, just install the application and start the recoll graphical
|
182 |
Recoll a try, just install the application and start the recoll graphical
|
|
... |
|
... |
189 |
area.
|
189 |
area.
|
190 |
|
190 |
|
191 |
Also be aware that you may need to install the appropriate supporting
|
191 |
Also be aware that you may need to install the appropriate supporting
|
192 |
applications for document types that need them (for example antiword for
|
192 |
applications for document types that need them (for example antiword for
|
193 |
Microsoft Word files).
|
193 |
Microsoft Word files).
|
194 |
|
|
|
195 |
----------------------------------------------------------------------
|
|
|
196 |
|
194 |
|
197 |
1.2. Full text search
|
195 |
1.2. Full text search
|
198 |
|
196 |
|
199 |
Recoll is a full text search application. Full text search applications
|
197 |
Recoll is a full text search application. Full text search applications
|
200 |
let you find your data by content rather than by external attributes (like
|
198 |
let you find your data by content rather than by external attributes (like
|
|
... |
|
... |
225 |
|
223 |
|
226 |
Stemming, by itself, does not accommodate for misspellings or phonetic
|
224 |
Stemming, by itself, does not accommodate for misspellings or phonetic
|
227 |
searches. Recoll supports these features through a specific tool (the term
|
225 |
searches. Recoll supports these features through a specific tool (the term
|
228 |
explorer) which will let you explore the set of index terms along
|
226 |
explorer) which will let you explore the set of index terms along
|
229 |
different modes.
|
227 |
different modes.
|
230 |
|
|
|
231 |
----------------------------------------------------------------------
|
|
|
232 |
|
228 |
|
233 |
1.3. Recoll overview
|
229 |
1.3. Recoll overview
|
234 |
|
230 |
|
235 |
Recoll uses the Xapian information retrieval library as its storage and
|
231 |
Recoll uses the Xapian information retrieval library as its storage and
|
236 |
retrieval engine. Xapian is a very mature package using a sophisticated
|
232 |
retrieval engine. Xapian is a very mature package using a sophisticated
|
|
... |
|
... |
309 |
options to help you find what you are looking for. However, there are
|
305 |
options to help you find what you are looking for. However, there are
|
310 |
other ways to perform Recoll searches: mostly a command line interface, a
|
306 |
other ways to perform Recoll searches: mostly a command line interface, a
|
311 |
Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
|
307 |
Python programming interface, a KDE KIO slave module, and a Ubuntu Unity
|
312 |
Lens module.
|
308 |
Lens module.
|
313 |
|
309 |
|
314 |
----------------------------------------------------------------------
|
310 |
Chapter 2. Indexing
|
315 |
|
|
|
316 |
Chapter 2. Indexing
|
|
|
317 |
|
311 |
|
318 |
2.1. Introduction
|
312 |
2.1. Introduction
|
319 |
|
313 |
|
320 |
Indexing is the process by which the set of documents is analyzed and the
|
314 |
Indexing is the process by which the set of documents is analyzed and the
|
321 |
data entered into the database. Recoll indexing is normally incremental:
|
315 |
data entered into the database. Recoll indexing is normally incremental:
|
|
... |
|
... |
325 |
-z or -Z).
|
319 |
-z or -Z).
|
326 |
|
320 |
|
327 |
The following sections give an overview of different aspects of the
|
321 |
The following sections give an overview of different aspects of the
|
328 |
indexing processes and configuration, with links to detailed sections.
|
322 |
indexing processes and configuration, with links to detailed sections.
|
329 |
|
323 |
|
330 |
----------------------------------------------------------------------
|
|
|
331 |
|
|
|
332 |
2.1.1. Indexing modes
|
324 |
2.1.1. Indexing modes
|
333 |
|
325 |
|
334 |
Recoll indexing can be performed along two different modes:
|
326 |
Recoll indexing can be performed along two different modes:
|
335 |
|
327 |
|
336 |
* Periodic (or batch) indexing: indexing takes place at discrete times,
|
328 |
o Periodic (or batch) indexing: indexing takes place at discrete times,
|
337 |
by executing the recollindex command. The typical usage is to have a
|
329 |
by executing the recollindex command. The typical usage is to have a
|
338 |
nightly indexing run programmed into your cron file.
|
330 |
nightly indexing run programmed into your cron file.
|
339 |
|
331 |
|
340 |
* Real time indexing: indexing takes place as soon as a file is created
|
332 |
o Real time indexing: indexing takes place as soon as a file is created
|
341 |
or changed. recollindex runs as a daemon and uses a file system
|
333 |
or changed. recollindex runs as a daemon and uses a file system
|
342 |
alteration monitor such as inotify, Fam or Gamin to detect file
|
334 |
alteration monitor such as inotify, Fam or Gamin to detect file
|
343 |
changes.
|
335 |
changes.
|
344 |
|
336 |
|
345 |
The choice between the two methods is mostly a matter of preference, and
|
337 |
The choice between the two methods is mostly a matter of preference, and
|
|
... |
|
... |
347 |
indexing on a big documentation directory, and real time indexing on a
|
339 |
indexing on a big documentation directory, and real time indexing on a
|
348 |
small home directory). Monitoring a big file system tree can consume
|
340 |
small home directory). Monitoring a big file system tree can consume
|
349 |
significant system resources.
|
341 |
significant system resources.
|
350 |
|
342 |
|
351 |
The choice of method and the parameters used can be configured from the
|
343 |
The choice of method and the parameters used can be configured from the
|
352 |
recoll GUI: Preferences->Indexing schedule
|
344 |
recoll GUI: Preferences -> Indexing schedule
|
353 |
|
|
|
354 |
----------------------------------------------------------------------
|
|
|
355 |
|
345 |
|
356 |
2.1.2. Configurations, multiple indexes
|
346 |
2.1.2. Configurations, multiple indexes
|
357 |
|
347 |
|
358 |
The parameters describing what is to be indexed and local preferences are
|
348 |
The parameters describing what is to be indexed and local preferences are
|
359 |
defined in text files contained in a configuration directory.
|
349 |
defined in text files contained in a configuration directory.
|
|
... |
|
... |
380 |
|
370 |
|
381 |
For index generation, multiple configurations are totally independant from
|
371 |
For index generation, multiple configurations are totally independant from
|
382 |
each other. When multiple indexes need to be used for a single search,
|
372 |
each other. When multiple indexes need to be used for a single search,
|
383 |
some parameters should be consistent among the configurations.
|
373 |
some parameters should be consistent among the configurations.
|
384 |
|
374 |
|
385 |
----------------------------------------------------------------------
|
|
|
386 |
|
|
|
387 |
2.1.3. Document types
|
375 |
2.1.3. Document types
|
388 |
|
376 |
|
389 |
Recoll knows about quite a few different document types. The parameters
|
377 |
Recoll knows about quite a few different document types. The parameters
|
390 |
for document types recognition and processing are set in configuration
|
378 |
for document types recognition and processing are set in configuration
|
391 |
files.
|
379 |
files.
|
|
... |
|
... |
402 |
|
390 |
|
403 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
391 |
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
404 |
applications for preprocessing. The list is in the installation section.
|
392 |
applications for preprocessing. The list is in the installation section.
|
405 |
After every indexing operation, Recoll updates a list of commands that
|
393 |
After every indexing operation, Recoll updates a list of commands that
|
406 |
would be needed for indexing existing files types. This list can be
|
394 |
would be needed for indexing existing files types. This list can be
|
407 |
displayed by selecting the menu option File->Show Missing Helpers in the
|
395 |
displayed by selecting the menu option File -> Show Missing Helpers in the
|
408 |
recoll GUI. It is stored in the missing text file inside the configuration
|
396 |
recoll GUI. It is stored in the missing text file inside the configuration
|
409 |
directory.
|
397 |
directory.
|
410 |
|
|
|
411 |
----------------------------------------------------------------------
|
|
|
412 |
|
398 |
|
413 |
2.1.4. Recovery
|
399 |
2.1.4. Recovery
|
414 |
|
400 |
|
415 |
In the rare case where the index becomes corrupted (which can signal
|
401 |
In the rare case where the index becomes corrupted (which can signal
|
416 |
itself by weird search results or crashes), the index files need to be
|
402 |
itself by weird search results or crashes), the index files need to be
|
417 |
erased before restarting a clean indexing pass. Just delete the xapiandb
|
403 |
erased before restarting a clean indexing pass. Just delete the xapiandb
|
418 |
directory (see next section), or, alternatively, start the next
|
404 |
directory (see next section), or, alternatively, start the next
|
419 |
recollindex with the -z option, which will reset the database before
|
405 |
recollindex with the -z option, which will reset the database before
|
420 |
indexing.
|
406 |
indexing.
|
421 |
|
407 |
|
422 |
----------------------------------------------------------------------
|
|
|
423 |
|
|
|
424 |
2.2. Index storage
|
408 |
2.2. Index storage
|
425 |
|
409 |
|
426 |
The default location for the index data is the xapiandb subdirectory of
|
410 |
The default location for the index data is the xapiandb subdirectory of
|
427 |
the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
|
411 |
the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
|
428 |
This can be changed via two different methods (with different purposes):
|
412 |
This can be changed via two different methods (with different purposes):
|
429 |
|
413 |
|
430 |
* You can specify a different configuration directory by setting the
|
414 |
o You can specify a different configuration directory by setting the
|
431 |
RECOLL_CONFDIR environment variable, or using the -c option to the
|
415 |
RECOLL_CONFDIR environment variable, or using the -c option to the
|
432 |
Recoll commands. This method would typically be used to index
|
416 |
Recoll commands. This method would typically be used to index
|
433 |
different areas of the file system to different indexes. For example,
|
417 |
different areas of the file system to different indexes. For example,
|
434 |
if you were to issue the following commands:
|
418 |
if you were to issue the following commands:
|
435 |
|
419 |
|
|
... |
|
... |
443 |
|
427 |
|
444 |
Using multiple configuration directories and configuration options
|
428 |
Using multiple configuration directories and configuration options
|
445 |
allows you to tailor multiple configurations and indexes to handle
|
429 |
allows you to tailor multiple configurations and indexes to handle
|
446 |
whatever subset of the available data you wish to make searchable.
|
430 |
whatever subset of the available data you wish to make searchable.
|
447 |
|
431 |
|
448 |
* For a given configuration directory, you can specify a non-default
|
432 |
o For a given configuration directory, you can specify a non-default
|
449 |
storage location for the index by setting the dbdir parameter in the
|
433 |
storage location for the index by setting the dbdir parameter in the
|
450 |
configuration file (see the configuration section). This method would
|
434 |
configuration file (see the configuration section). This method would
|
451 |
mainly be of use if you wanted to keep the configuration directory in
|
435 |
mainly be of use if you wanted to keep the configuration directory in
|
452 |
its default location, but desired another location for the index,
|
436 |
its default location, but desired another location for the index,
|
453 |
typically out of disk occupation concerns.
|
437 |
typically out of disk occupation concerns.
|
|
... |
|
... |
466 |
|
450 |
|
467 |
The index data directory (xapiandb) only contains data that can be
|
451 |
The index data directory (xapiandb) only contains data that can be
|
468 |
completely rebuilt by an index run (as long as the original documents
|
452 |
completely rebuilt by an index run (as long as the original documents
|
469 |
exist), and it can always be destroyed safely.
|
453 |
exist), and it can always be destroyed safely.
|
470 |
|
454 |
|
471 |
----------------------------------------------------------------------
|
|
|
472 |
|
|
|
473 |
2.2.1. Xapian index formats
|
455 |
2.2.1. Xapian index formats
|
474 |
|
456 |
|
475 |
Xapian versions usually support several formats for index storage. A given
|
457 |
Xapian versions usually support several formats for index storage. A given
|
476 |
major Xapian version will have a current format, used to create new
|
458 |
major Xapian version will have a current format, used to create new
|
477 |
indexes, and will also support the format from the previous major version.
|
459 |
indexes, and will also support the format from the previous major version.
|
|
... |
|
... |
484 |
|
466 |
|
485 |
Using the -z option to recollindex is not sufficient to change the format,
|
467 |
Using the -z option to recollindex is not sufficient to change the format,
|
486 |
you will have to delete all files inside the index directory (typically
|
468 |
you will have to delete all files inside the index directory (typically
|
487 |
~/.recoll/xapiandb) before starting the indexing.
|
469 |
~/.recoll/xapiandb) before starting the indexing.
|
488 |
|
470 |
|
489 |
----------------------------------------------------------------------
|
|
|
490 |
|
|
|
491 |
2.2.2. Security aspects
|
471 |
2.2.2. Security aspects
|
492 |
|
472 |
|
493 |
The Recoll index does not hold copies of the indexed documents. But it
|
473 |
The Recoll index does not hold copies of the indexed documents. But it
|
494 |
does hold enough data to allow for an almost complete reconstruction. If
|
474 |
does hold enough data to allow for an almost complete reconstruction. If
|
495 |
confidential data is indexed, access to the database directory should be
|
475 |
confidential data is indexed, access to the database directory should be
|
|
... |
|
... |
501 |
in appropriate protection.
|
481 |
in appropriate protection.
|
502 |
|
482 |
|
503 |
If you use another setup, you should think of the kind of protection you
|
483 |
If you use another setup, you should think of the kind of protection you
|
504 |
need for your index, set the directory and files access modes
|
484 |
need for your index, set the directory and files access modes
|
505 |
appropriately, and also maybe adjust the umask used during index updates.
|
485 |
appropriately, and also maybe adjust the umask used during index updates.
|
506 |
|
|
|
507 |
----------------------------------------------------------------------
|
|
|
508 |
|
486 |
|
509 |
2.3. Index configuration
|
487 |
2.3. Index configuration
|
510 |
|
488 |
|
511 |
Variables set inside the Recoll configuration files control which areas of
|
489 |
Variables set inside the Recoll configuration files control which areas of
|
512 |
the file system are indexed, and how files are processed. These variables
|
490 |
the file system are indexed, and how files are processed. These variables
|
|
... |
|
... |
531 |
section.
|
509 |
section.
|
532 |
|
510 |
|
533 |
As of Recoll 1.18 there are two incompatible types of Recoll indexes,
|
511 |
As of Recoll 1.18 there are two incompatible types of Recoll indexes,
|
534 |
depending on the treatment of character case and diacritics. The next
|
512 |
depending on the treatment of character case and diacritics. The next
|
535 |
section describes the two types in more detail.
|
513 |
section describes the two types in more detail.
|
536 |
|
|
|
537 |
----------------------------------------------------------------------
|
|
|
538 |
|
514 |
|
539 |
2.3.1. Multiple indexes
|
515 |
2.3.1. Multiple indexes
|
540 |
|
516 |
|
541 |
Multiple Recoll indexes can be created by using several configuration
|
517 |
Multiple Recoll indexes can be created by using several configuration
|
542 |
directories which are usually set to index different areas of the file
|
518 |
directories which are usually set to index different areas of the file
|
|
... |
|
... |
573 |
Most importantly, all indexes to be queried concurrently must have the
|
549 |
Most importantly, all indexes to be queried concurrently must have the
|
574 |
same option concerning character case and diacritics stripping, but there
|
550 |
same option concerning character case and diacritics stripping, but there
|
575 |
are other constraints. Most of the relevant parameters are described in
|
551 |
are other constraints. Most of the relevant parameters are described in
|
576 |
the linked section.
|
552 |
the linked section.
|
577 |
|
553 |
|
578 |
----------------------------------------------------------------------
|
|
|
579 |
|
|
|
580 |
2.3.2. Index case and diacritics sensitivity
|
554 |
2.3.2. Index case and diacritics sensitivity
|
581 |
|
555 |
|
582 |
As of Recoll version 1.18 you have a choice of building an index with
|
556 |
As of Recoll version 1.18 you have a choice of building an index with
|
583 |
terms stripped of character case and diacritics, or one with raw terms.
|
557 |
terms stripped of character case and diacritics, or one with raw terms.
|
584 |
For a source term of Resume, the former will store resume, the latter
|
558 |
For a source term of Resume, the former will store resume, the latter
|
|
... |
|
... |
606 |
As a cost for added capability, a raw index will be slightly bigger than a
|
580 |
As a cost for added capability, a raw index will be slightly bigger than a
|
607 |
stripped one (around 10%). Also, searches will be more complex, so
|
581 |
stripped one (around 10%). Also, searches will be more complex, so
|
608 |
probably slightly slower, and the feature is still young, so that a
|
582 |
probably slightly slower, and the feature is still young, so that a
|
609 |
certain amount of weirdness cannot be excluded.
|
583 |
certain amount of weirdness cannot be excluded.
|
610 |
|
584 |
|
611 |
----------------------------------------------------------------------
|
|
|
612 |
|
|
|
613 |
2.3.3. The index configuration GUI
|
585 |
2.3.3. The index configuration GUI
|
614 |
|
586 |
|
615 |
Most parameters for a given index configuration can be set from a recoll
|
587 |
Most parameters for a given index configuration can be set from a recoll
|
616 |
GUI running on this configuration (either as default, or by setting
|
588 |
GUI running on this configuration (either as default, or by setting
|
617 |
RECOLL_CONFDIR or the -c option.)
|
589 |
RECOLL_CONFDIR or the -c option.)
|
618 |
|
590 |
|
619 |
The interface is started from the Preferences->Index Configuration menu
|
591 |
The interface is started from the Preferences -> Index Configuration menu
|
620 |
entry. It is divided in four tabs, Global parameters, Local parameters,
|
592 |
entry. It is divided in four tabs, Global parameters, Local parameters,
|
621 |
Beagle web history (which is explained in the next section) and Search
|
593 |
Web history (which is explained in the next section) and Search
|
622 |
parameters.
|
594 |
parameters.
|
623 |
|
595 |
|
624 |
The Global parameters tab allows setting global variables, like the lists
|
596 |
The Global parameters tab allows setting global variables, like the lists
|
625 |
of top directories, skipped paths, or stemming languages.
|
597 |
of top directories, skipped paths, or stemming languages.
|
626 |
|
598 |
|
|
... |
|
... |
641 |
The configuration tool normally respects the comments and most of the
|
613 |
The configuration tool normally respects the comments and most of the
|
642 |
formatting inside the configuration file, so that it is quite possible to
|
614 |
formatting inside the configuration file, so that it is quite possible to
|
643 |
use it on hand-edited files, which you might nevertheless want to backup
|
615 |
use it on hand-edited files, which you might nevertheless want to backup
|
644 |
first...
|
616 |
first...
|
645 |
|
617 |
|
646 |
----------------------------------------------------------------------
|
618 |
2.4. Index WEB visited page history
|
647 |
|
619 |
|
648 |
2.4. Using Beagle WEB browser plugins
|
620 |
With the help of a Firefox extension, Recoll can index the Internet pages
|
|
|
621 |
that you visit. The extension was initially designed for the Beagle
|
|
|
622 |
indexer, but it has recently be renamed and better adapted to Recoll.
|
649 |
|
623 |
|
650 |
Beagle is (was?) a concurrent desktop indexer, built on Lucene and the
|
|
|
651 |
Mono project (C#), for which a number of add-on browser plugins were
|
|
|
652 |
written. These work by copying visited web pages to an indexing queue
|
624 |
The extension works by copying visited WEB pages to an indexing queue
|
653 |
directory, which the indexer then processes. Especially, there is a
|
625 |
directory, which Recoll then processes, indexing the data, storing it into
|
654 |
Firefox extension.
|
626 |
a local cache, then removing the file from the queue.
|
655 |
|
|
|
656 |
If, for any reason, you so happen to prefer Recoll to Beagle, you can
|
|
|
657 |
still use the Firefox plugin, which is written in Javascript and
|
|
|
658 |
completely independant of C#, Beagle, Lucene..., and set Recoll to process
|
|
|
659 |
the Beagle queue directory. This supposes that Beagle is not running, else
|
|
|
660 |
both programs will fight for the same files.
|
|
|
661 |
|
627 |
|
662 |
This feature can be enabled in the GUI Index configuration panel, or by
|
628 |
This feature can be enabled in the GUI Index configuration panel, or by
|
663 |
editing the configuration file (set processbeaglequeue to 1).
|
629 |
editing the configuration file (set processwebqueue to 1).
|
664 |
|
630 |
|
665 |
There are more recent instructions about how to find and install the
|
631 |
A current pointer to the extension can be found, along with up-to-date
|
666 |
Firefox extension on the Recoll wiki.
|
632 |
instructions, on the Recoll wiki.
|
667 |
|
633 |
|
668 |
Unfortunately, it seems that the plugin does not work anymore with recent
|
634 |
A copy of the indexed WEB pages is retained by Recoll in a local cache
|
669 |
Firefox versions (tried with 10.0). This is not the trival installation
|
635 |
(from which previews can be fetched). The cache size can be adjusted from
|
670 |
version check issue, explicit manual indexing requests still work, but
|
636 |
the Index configuration / Web history panel. Once the maximum size is
|
671 |
automatic indexing on page load does not.
|
637 |
reached, old pages are purged - both from the cache and the index - to
|
672 |
|
638 |
make room for new ones, so you need to explicitly archive in some other
|
673 |
----------------------------------------------------------------------
|
639 |
place the pages that you want to keep indefinitely.
|
674 |
|
640 |
|
675 |
2.5. Periodic indexing
|
641 |
2.5. Periodic indexing
|
676 |
|
642 |
|
677 |
2.5.1. Running indexing
|
643 |
2.5.1. Running indexing
|
678 |
|
644 |
|
|
... |
|
... |
687 |
start indexing (except if canceled).
|
653 |
start indexing (except if canceled).
|
688 |
|
654 |
|
689 |
The recollindex indexing process can be interrupted by sending an
|
655 |
The recollindex indexing process can be interrupted by sending an
|
690 |
interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may
|
656 |
interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may
|
691 |
elapse before the process exits, because it needs to properly flush and
|
657 |
elapse before the process exits, because it needs to properly flush and
|
692 |
close the index. This can also be done from the recoll GUI File->Stop
|
658 |
close the index. This can also be done from the recoll GUI File -> Stop
|
693 |
Indexing menu entry.
|
659 |
Indexing menu entry.
|
694 |
|
660 |
|
695 |
After such an interruption, the index will be somewhat inconsistent
|
661 |
After such an interruption, the index will be somewhat inconsistent
|
696 |
because some operations which are normally performed at the end of the
|
662 |
because some operations which are normally performed at the end of the
|
697 |
indexing pass will have been skipped (for example, the stemming and
|
663 |
indexing pass will have been skipped (for example, the stemming and
|
|
... |
|
... |
721 |
file selection process for some area of the file system, by adding the top
|
687 |
file selection process for some area of the file system, by adding the top
|
722 |
directory to the skippedPaths list and using an appropriate file selection
|
688 |
directory to the skippedPaths list and using an appropriate file selection
|
723 |
method to build the file list to be fed to recollindex -if. Trivial
|
689 |
method to build the file list to be fed to recollindex -if. Trivial
|
724 |
example:
|
690 |
example:
|
725 |
|
691 |
|
726 |
find . -name indexable.txt -print | recollindex -if
|
692 |
find . -name indexable.txt -print | recollindex -if
|
727 |
|
693 |
|
728 |
|
694 |
|
729 |
recollindex -i will not descend into subdirectories specified as
|
695 |
recollindex -i will not descend into subdirectories specified as
|
730 |
parameters, but just add them as index entries. It is up to the external
|
696 |
parameters, but just add them as index entries. It is up to the external
|
731 |
file selection method to build the complete file list.
|
697 |
file selection method to build the complete file list.
|
732 |
|
698 |
|
733 |
----------------------------------------------------------------------
|
|
|
734 |
|
|
|
735 |
2.5.2. Using cron to automate indexing
|
699 |
2.5.2. Using cron to automate indexing
|
736 |
|
700 |
|
737 |
The most common way to set up indexing is to have a cron task execute it
|
701 |
The most common way to set up indexing is to have a cron task execute it
|
738 |
every night. For example the following crontab entry would do it every day
|
702 |
every night. For example the following crontab entry would do it every day
|
739 |
at 3:30AM (supposing recollindex is in your PATH):
|
703 |
at 3:30AM (supposing recollindex is in your PATH):
|
|
... |
|
... |
743 |
Or, using anacron:
|
707 |
Or, using anacron:
|
744 |
|
708 |
|
745 |
1 15 su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"
|
709 |
1 15 su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"
|
746 |
|
710 |
|
747 |
As of version 1.17 the Recoll GUI has dialogs to manage crontab entries
|
711 |
As of version 1.17 the Recoll GUI has dialogs to manage crontab entries
|
748 |
for recollindex. You can reach them from the Preferences->Indexing
|
712 |
for recollindex. You can reach them from the Preferences -> Indexing
|
749 |
Schedule menu. They only work with the good old cron, and do not give
|
713 |
Schedule menu. They only work with the good old cron, and do not give
|
750 |
access to all features of cron scheduling.
|
714 |
access to all features of cron scheduling.
|
751 |
|
715 |
|
752 |
The usual command to edit your crontab is crontab -e (which will usually
|
716 |
The usual command to edit your crontab is crontab -e (which will usually
|
753 |
start the vi editor to edit the file). You may have more sophisticated
|
717 |
start the vi editor to edit the file). You may have more sophisticated
|
|
... |
|
... |
756 |
Please be aware that there may be differences between your usual
|
720 |
Please be aware that there may be differences between your usual
|
757 |
interactive command line environment and the one seen by crontab commands.
|
721 |
interactive command line environment and the one seen by crontab commands.
|
758 |
Especially the PATH variable may be of concern. Please check the crontab
|
722 |
Especially the PATH variable may be of concern. Please check the crontab
|
759 |
manual pages about possible issues.
|
723 |
manual pages about possible issues.
|
760 |
|
724 |
|
761 |
----------------------------------------------------------------------
|
|
|
762 |
|
|
|
763 |
2.6. Real time indexing
|
725 |
2.6. Real time indexing
|
764 |
|
726 |
|
765 |
Real time monitoring/indexing is performed by starting the recollindex -m
|
727 |
Real time monitoring/indexing is performed by starting the recollindex -m
|
766 |
command. With this option, recollindex will detach from the terminal and
|
728 |
command. With this option, recollindex will detach from the terminal and
|
767 |
become a daemon, permanently monitoring file changes and updating the
|
729 |
become a daemon, permanently monitoring file changes and updating the
|
|
... |
|
... |
785 |
recollconf=$HOME/.recoll-home
|
747 |
recollconf=$HOME/.recoll-home
|
786 |
recolldata=/usr/local/share/recoll
|
748 |
recolldata=/usr/local/share/recoll
|
787 |
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
749 |
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
788 |
|
750 |
|
789 |
fvwm
|
751 |
fvwm
|
|
|
752 |
|
790 |
|
753 |
|
791 |
The indexing daemon gets started, then the window manager, for which the
|
754 |
The indexing daemon gets started, then the window manager, for which the
|
792 |
session waits.
|
755 |
session waits.
|
793 |
|
756 |
|
794 |
By default the indexing daemon will monitor the state of the X11 session,
|
757 |
By default the indexing daemon will monitor the state of the X11 session,
|
|
... |
|
... |
816 |
email folders change. Also, monitoring large file trees by itself
|
779 |
email folders change. Also, monitoring large file trees by itself
|
817 |
significantly taxes system resources. You probably do not want to enable
|
780 |
significantly taxes system resources. You probably do not want to enable
|
818 |
it if your system is short on resources. Periodic indexing is adequate in
|
781 |
it if your system is short on resources. Periodic indexing is adequate in
|
819 |
most cases.
|
782 |
most cases.
|
820 |
|
783 |
|
821 |
----------------------------------------------------------------------
|
|
|
822 |
|
|
|
823 |
2.6.1. Slowing down the reindexing rate for fast changing files
|
784 |
2.6.1. Slowing down the reindexing rate for fast changing files
|
824 |
|
785 |
|
825 |
When using the real time monitor, it may happen that some files need to be
|
786 |
When using the real time monitor, it may happen that some files need to be
|
826 |
indexed, but change so often that they impose an excessive load for the
|
787 |
indexed, but change so often that they impose an excessive load for the
|
827 |
system.
|
788 |
system.
|
828 |
|
789 |
|
829 |
Recoll provides a configuration option to specify the minimum time before
|
790 |
Recoll provides a configuration option to specify the minimum time before
|
830 |
which a file, specified by a wildcard pattern, cannot be reindexed. See
|
791 |
which a file, specified by a wildcard pattern, cannot be reindexed. See
|
831 |
the mondelaypatterns parameter in the configuration section.
|
792 |
the mondelaypatterns parameter in the configuration section.
|
832 |
|
793 |
|
833 |
----------------------------------------------------------------------
|
794 |
Chapter 3. Searching
|
834 |
|
|
|
835 |
Chapter 3. Searching
|
|
|
836 |
|
795 |
|
837 |
3.1. Searching with the Qt graphical user interface
|
796 |
3.1. Searching with the Qt graphical user interface
|
838 |
|
797 |
|
839 |
The recoll program provides the main user interface for searching. It is
|
798 |
The recoll program provides the main user interface for searching. It is
|
840 |
based on the Qt library.
|
799 |
based on the Qt library.
|
841 |
|
800 |
|
842 |
recoll has two search modes:
|
801 |
recoll has two search modes:
|
843 |
|
802 |
|
844 |
* Simple search (the default, on the main screen) has a single entry
|
803 |
o Simple search (the default, on the main screen) has a single entry
|
845 |
field where you can enter multiple words.
|
804 |
field where you can enter multiple words.
|
846 |
|
805 |
|
847 |
* Advanced search (a panel accessed through the Tools menu or the
|
806 |
o Advanced search (a panel accessed through the Tools menu or the
|
848 |
toolbox bar icon) has multiple entry fields, which you may use to
|
807 |
toolbox bar icon) has multiple entry fields, which you may use to
|
849 |
build a logical condition, with additional filtering on file type,
|
808 |
build a logical condition, with additional filtering on file type,
|
850 |
location in the file system, modification date, and size.
|
809 |
location in the file system, modification date, and size.
|
851 |
|
810 |
|
852 |
In most cases, you can enter the terms as you think them, even if they
|
811 |
In most cases, you can enter the terms as you think them, even if they
|
|
... |
|
... |
858 |
printed is for east-asian languages (Chinese, Japanese, Korean). Words
|
817 |
printed is for east-asian languages (Chinese, Japanese, Korean). Words
|
859 |
composed of single or multiple characters should be entered separated by
|
818 |
composed of single or multiple characters should be entered separated by
|
860 |
white space in this case (they would typically be printed without white
|
819 |
white space in this case (they would typically be printed without white
|
861 |
space).
|
820 |
space).
|
862 |
|
821 |
|
863 |
----------------------------------------------------------------------
|
|
|
864 |
|
|
|
865 |
3.1.1. Simple search
|
822 |
3.1.1. Simple search
|
866 |
|
823 |
|
867 |
1. Start the recoll program.
|
824 |
1. Start the recoll program.
|
868 |
|
825 |
|
869 |
2. Possibly choose a search mode: Any term, All terms, File name or Query
|
826 |
2. Possibly choose a search mode: Any term, All terms, File name or Query
|
|
... |
|
... |
888 |
File name will specifically look for file names. The point of having a
|
845 |
File name will specifically look for file names. The point of having a
|
889 |
separate file name search is that wild card expansion can be performed
|
846 |
separate file name search is that wild card expansion can be performed
|
890 |
more efficiently on a small subset of the index (allowing wild cards on
|
847 |
more efficiently on a small subset of the index (allowing wild cards on
|
891 |
the left of terms without excessive penality). Things to know:
|
848 |
the left of terms without excessive penality). Things to know:
|
892 |
|
849 |
|
893 |
* White space in the entry should match white space in the file name,
|
850 |
o White space in the entry should match white space in the file name,
|
894 |
and is not treated specially.
|
851 |
and is not treated specially.
|
895 |
|
852 |
|
896 |
* The search is insensitive to character case and accents, independantly
|
853 |
o The search is insensitive to character case and accents, independantly
|
897 |
of the type of index.
|
854 |
of the type of index.
|
898 |
|
855 |
|
899 |
* An entry without any wild card character and not capitalized will be
|
856 |
o An entry without any wild card character and not capitalized will be
|
900 |
prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
|
857 |
prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
|
901 |
|
858 |
|
902 |
* If you have a big index (many files), excessively generic fragments
|
859 |
o If you have a big index (many files), excessively generic fragments
|
903 |
may result in inefficient searches.
|
860 |
may result in inefficient searches.
|
904 |
|
861 |
|
905 |
You can search for exact phrases (adjacent words in a given order) by
|
862 |
You can search for exact phrases (adjacent words in a given order) by
|
906 |
enclosing the input inside double quotes. Ex: "virtual reality".
|
863 |
enclosing the input inside double quotes. Ex: "virtual reality".
|
907 |
|
864 |
|
|
... |
|
... |
928 |
punctuation, newlines and all - except for wildcard characters (single ?
|
885 |
punctuation, newlines and all - except for wildcard characters (single ?
|
929 |
characters are ok). Recoll will process it and produce a meaningful
|
886 |
characters are ok). Recoll will process it and produce a meaningful
|
930 |
search. This is what most differentiates this mode from the Query Language
|
887 |
search. This is what most differentiates this mode from the Query Language
|
931 |
mode, where you have to care about the syntax.
|
888 |
mode, where you have to care about the syntax.
|
932 |
|
889 |
|
933 |
You can use the Tools->Advanced search dialog for more complex searches.
|
890 |
You can use the Tools -> Advanced search dialog for more complex searches.
|
934 |
|
|
|
935 |
----------------------------------------------------------------------
|
|
|
936 |
|
891 |
|
937 |
3.1.2. The default result list
|
892 |
3.1.2. The default result list
|
938 |
|
893 |
|
939 |
After starting a search, a list of results will instantly be displayed in
|
894 |
After starting a search, a list of results will instantly be displayed in
|
940 |
the main list window.
|
895 |
the main list window.
|
|
... |
|
... |
949 |
open tabs in the existing preview window. You can use Shift+Click to force
|
904 |
open tabs in the existing preview window. You can use Shift+Click to force
|
950 |
the creation of another preview window, which may be useful to view the
|
905 |
the creation of another preview window, which may be useful to view the
|
951 |
documents side by side. (You can also browse successive results in a
|
906 |
documents side by side. (You can also browse successive results in a
|
952 |
single preview window by typing Shift+ArrowUp/Down in the window).
|
907 |
single preview window by typing Shift+ArrowUp/Down in the window).
|
953 |
|
908 |
|
954 |
Clicking the Open link will attempt to start an external viewer. The
|
909 |
Clicking the Open link will start an external viewer for the document. By
|
955 |
viewer for each document type can be configured through the user
|
910 |
default, Recoll lets the desktop choose the appropriate application for
|
956 |
preferences dialog, or by editing the mimeview configuration file. You can
|
911 |
most document types (there is a short list of exceptions, see further). If
|
|
|
912 |
you prefer to completely customize the choice of applications, you can
|
957 |
also check the Use desktop preferences option in the GUI preferences
|
913 |
uncheck the Use desktop preferences option in the GUI preferences dialog,
|
958 |
dialog to use the desktop defaults for all documents. This is probably the
|
914 |
and click the Choose editor applications button to adjust the predefined
|
959 |
best option if you are using a well configured Gnome or KDE desktop.
|
915 |
Recoll choices. The tool accepts multiple selections of mime types (e.g.
|
|
|
916 |
to set up the editor for the dozens of office file types).
|
|
|
917 |
|
|
|
918 |
Even when Use desktop preferences is checked, there is a small list of
|
|
|
919 |
exceptions, for mime types where the Recoll choice should override the
|
|
|
920 |
desktop one. These are applications which are well integrated with Recoll,
|
|
|
921 |
especially evince for viewing PDF and Postscript files because of its
|
|
|
922 |
support for opening the document at a specific page and passing a search
|
|
|
923 |
string as an argument. Of course, you can edit the list (in the GUI
|
|
|
924 |
preferences) if you would prefer to lose the functionality and use the
|
|
|
925 |
standard desktop tool.
|
|
|
926 |
|
|
|
927 |
You may also change the choice of applications by editing the mimeview
|
|
|
928 |
configuration file if you find this more convenient.
|
960 |
|
929 |
|
961 |
The Preview and Open edit links may not be present for all entries,
|
930 |
The Preview and Open edit links may not be present for all entries,
|
962 |
meaning that Recoll has no configured way to preview a given file type
|
931 |
meaning that Recoll has no configured way to preview a given file type
|
963 |
(which was indexed by name only), or no configured external editor for the
|
932 |
(which was indexed by name only), or no configured external editor for the
|
964 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
933 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
|
... |
|
... |
977 |
|
946 |
|
978 |
The result list is divided into pages (the size of which you can change in
|
947 |
The result list is divided into pages (the size of which you can change in
|
979 |
the preferences). Use the arrow buttons in the toolbar or the links at the
|
948 |
the preferences). Use the arrow buttons in the toolbar or the links at the
|
980 |
bottom of the page to browse the results.
|
949 |
bottom of the page to browse the results.
|
981 |
|
950 |
|
982 |
----------------------------------------------------------------------
|
951 |
3.1.2.1. No results: the spelling suggestions
|
983 |
|
952 |
|
|
|
953 |
When a search yields no result, and if the aspell dictionary is
|
|
|
954 |
configured, Recoll will try to check for misspellings among the query
|
|
|
955 |
terms, and will propose lists of replacements. Clicking on one of the
|
|
|
956 |
suggestions will replace the word and restart the search. You can hold any
|
|
|
957 |
of the modifier keys (Ctrl, Shift, etc.) while clicking if you would
|
|
|
958 |
rather stay on the suggestion screen because several terms need
|
|
|
959 |
replacement.
|
|
|
960 |
|
984 |
3.1.2.1. The result list right-click menu
|
961 |
3.1.2.2. The result list right-click menu
|
985 |
|
962 |
|
986 |
Apart from the preview and edit links, you can display a pop-up menu by
|
963 |
Apart from the preview and edit links, you can display a pop-up menu by
|
987 |
right-clicking over a paragraph in the result list. This menu has the
|
964 |
right-clicking over a paragraph in the result list. This menu has the
|
988 |
following entries:
|
965 |
following entries:
|
989 |
|
966 |
|
990 |
* Preview
|
967 |
o Preview
|
991 |
|
968 |
|
992 |
* Open
|
969 |
o Open
|
993 |
|
970 |
|
994 |
* Copy File Name
|
971 |
o Copy File Name
|
995 |
|
972 |
|
996 |
* Copy Url
|
973 |
o Copy Url
|
997 |
|
974 |
|
998 |
* Save to File
|
975 |
o Save to File
|
999 |
|
976 |
|
1000 |
* Find similar
|
977 |
o Find similar
|
1001 |
|
978 |
|
1002 |
* Preview Parent document
|
979 |
o Preview Parent document
|
1003 |
|
980 |
|
1004 |
* Open Parent document
|
981 |
o Open Parent document
|
1005 |
|
982 |
|
1006 |
* Open Snippets Window
|
983 |
o Open Snippets Window
|
1007 |
|
984 |
|
1008 |
The Preview and Open entries do the same thing as the corresponding links.
|
985 |
The Preview and Open entries do the same thing as the corresponding links.
|
1009 |
|
986 |
|
1010 |
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
987 |
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
1011 |
for later pasting.
|
988 |
for later pasting.
|
|
... |
|
... |
1036 |
lists extracts from the document, taken around search terms occurrences,
|
1013 |
lists extracts from the document, taken around search terms occurrences,
|
1037 |
along with the corresponding page number, as links which can be used to
|
1014 |
along with the corresponding page number, as links which can be used to
|
1038 |
start the native viewer on the appropriate page. If the viewer supports
|
1015 |
start the native viewer on the appropriate page. If the viewer supports
|
1039 |
it, its search function will also be primed with one of the search terms.
|
1016 |
it, its search function will also be primed with one of the search terms.
|
1040 |
|
1017 |
|
1041 |
----------------------------------------------------------------------
|
|
|
1042 |
|
|
|
1043 |
3.1.3. The result table
|
1018 |
3.1.3. The result table
|
1044 |
|
1019 |
|
1045 |
In Recoll 1.15 and newer, the results can be displayed in spreadsheet-like
|
1020 |
In Recoll 1.15 and newer, the results can be displayed in spreadsheet-like
|
1046 |
fashion. You can switch to this presentation by clicking the table-like
|
1021 |
fashion. You can switch to this presentation by clicking the table-like
|
1047 |
icon in the toolbar (this is a toggle, click again to restore the list).
|
1022 |
icon in the toolbar (this is a toggle, click again to restore the list).
|
|
... |
|
... |
1063 |
window with the corresponding values. You can click the row to freeze the
|
1038 |
window with the corresponding values. You can click the row to freeze the
|
1064 |
display. The bottom area is equivalent to a result list paragraph, with
|
1039 |
display. The bottom area is equivalent to a result list paragraph, with
|
1065 |
links for starting a preview or a native application, and an equivalent
|
1040 |
links for starting a preview or a native application, and an equivalent
|
1066 |
right-click menu. Typing Esc (the Escape key) will unfreeze the display.
|
1041 |
right-click menu. Typing Esc (the Escape key) will unfreeze the display.
|
1067 |
|
1042 |
|
1068 |
----------------------------------------------------------------------
|
1043 |
3.1.4. Displaying thumbnails
|
1069 |
|
1044 |
|
|
|
1045 |
The default format for the result list entries and the detail area of the
|
|
|
1046 |
result table display an icon for each result document. The icon is either
|
|
|
1047 |
a generic one determined from the MIME type, or a thumbnail of the
|
|
|
1048 |
document appearance. Thumbnails are only displayed if found in the
|
|
|
1049 |
standard freedesktop location, where they would typically have been
|
|
|
1050 |
created by a file manager.
|
|
|
1051 |
|
|
|
1052 |
Recoll has no capability to create thumbnails. A relatively simple trick
|
|
|
1053 |
is to use the Open parent document/folder entry in the result list popup
|
|
|
1054 |
menu. This should open a file manager window on the containing directory,
|
|
|
1055 |
which should in turn create the thumbnails (depending on your settings).
|
|
|
1056 |
Restarting the search should then display the thumbnails.
|
|
|
1057 |
|
|
|
1058 |
There are also some pointers about thumbnail generation on the Recoll
|
|
|
1059 |
wiki.
|
|
|
1060 |
|
1070 |
3.1.4. The preview window
|
1061 |
3.1.5. The preview window
|
1071 |
|
1062 |
|
1072 |
The preview window opens when you first click a Preview link inside the
|
1063 |
The preview window opens when you first click a Preview link inside the
|
1073 |
result list.
|
1064 |
result list.
|
1074 |
|
1065 |
|
1075 |
Subsequent preview requests for a given search open new tabs in the
|
1066 |
Subsequent preview requests for a given search open new tabs in the
|
|
... |
|
... |
1098 |
metadata stored in the index.
|
1089 |
metadata stored in the index.
|
1099 |
|
1090 |
|
1100 |
You can print the current preview window contents by typing Ctrl-P (Ctrl +
|
1091 |
You can print the current preview window contents by typing Ctrl-P (Ctrl +
|
1101 |
P) in the window text.
|
1092 |
P) in the window text.
|
1102 |
|
1093 |
|
1103 |
----------------------------------------------------------------------
|
|
|
1104 |
|
|
|
1105 |
3.1.4.1. Searching inside the preview
|
1094 |
3.1.5.1. Searching inside the preview
|
1106 |
|
1095 |
|
1107 |
The preview window has an internal search capability, mostly controlled by
|
1096 |
The preview window has an internal search capability, mostly controlled by
|
1108 |
the panel at the bottom of the window, which works in two modes: as a
|
1097 |
the panel at the bottom of the window, which works in two modes: as a
|
1109 |
classical editor incremental search, where we look for the text entered in
|
1098 |
classical editor incremental search, where we look for the text entered in
|
1110 |
the entry zone, or as a way to walk the matches between the document and
|
1099 |
the entry zone, or as a way to walk the matches between the document and
|
|
... |
|
... |
1133 |
list for this group will be walked. This is not the same as a text
|
1122 |
list for this group will be walked. This is not the same as a text
|
1134 |
search, because the occurences will include non-exact matches (as
|
1123 |
search, because the occurences will include non-exact matches (as
|
1135 |
caused by stemming or wildcards). The search will revert to the
|
1124 |
caused by stemming or wildcards). The search will revert to the
|
1136 |
text mode as soon as you edit the entry area.
|
1125 |
text mode as soon as you edit the entry area.
|
1137 |
|
1126 |
|
1138 |
----------------------------------------------------------------------
|
|
|
1139 |
|
|
|
1140 |
3.1.5. Complex/advanced search
|
1127 |
3.1.6. Complex/advanced search
|
1141 |
|
1128 |
|
1142 |
The advanced search dialog helps you build more complex queries without
|
1129 |
The advanced search dialog helps you build more complex queries without
|
1143 |
memorizing the search language constructs. It can be opened through the
|
1130 |
memorizing the search language constructs. It can be opened through the
|
1144 |
Tools menu or through the main toolbar.
|
1131 |
Tools menu or through the main toolbar.
|
1145 |
|
1132 |
|
|
... |
|
... |
1156 |
always performs a simple search.
|
1143 |
always performs a simple search.
|
1157 |
|
1144 |
|
1158 |
Click on the Show query details link at the top of the result page to see
|
1145 |
Click on the Show query details link at the top of the result page to see
|
1159 |
the query expansion.
|
1146 |
the query expansion.
|
1160 |
|
1147 |
|
1161 |
----------------------------------------------------------------------
|
|
|
1162 |
|
|
|
1163 |
3.1.5.1. Avanced search: the "find" tab
|
1148 |
3.1.6.1. Avanced search: the "find" tab
|
1164 |
|
1149 |
|
1165 |
This part of the dialog lets you constructc a query by combining multiple
|
1150 |
This part of the dialog lets you constructc a query by combining multiple
|
1166 |
clauses of different types. Each entry field is configurable for the
|
1151 |
clauses of different types. Each entry field is configurable for the
|
1167 |
following modes:
|
1152 |
following modes:
|
1168 |
|
1153 |
|
1169 |
* All terms.
|
1154 |
o All terms.
|
1170 |
|
1155 |
|
1171 |
* Any term.
|
1156 |
o Any term.
|
1172 |
|
1157 |
|
1173 |
* None of the terms.
|
1158 |
o None of the terms.
|
1174 |
|
1159 |
|
1175 |
* Phrase (exact terms in order within an adjustable window).
|
1160 |
o Phrase (exact terms in order within an adjustable window).
|
1176 |
|
1161 |
|
1177 |
* Proximity (terms in any order within an adjustable window).
|
1162 |
o Proximity (terms in any order within an adjustable window).
|
1178 |
|
1163 |
|
1179 |
* Filename search.
|
1164 |
o Filename search.
|
1180 |
|
1165 |
|
1181 |
Additional entry fields can be created by clicking the Add clause button.
|
1166 |
Additional entry fields can be created by clicking the Add clause button.
|
1182 |
|
1167 |
|
1183 |
When searching, the non-empty clauses will be combined either with an AND
|
1168 |
When searching, the non-empty clauses will be combined either with an AND
|
1184 |
or an OR conjunction, depending on the choice made on the left (All
|
1169 |
or an OR conjunction, depending on the choice made on the left (All
|
|
... |
|
... |
1198 |
quick fox with a slack of 0 will match quick fox but not quick brown fox.
|
1183 |
quick fox with a slack of 0 will match quick fox but not quick brown fox.
|
1199 |
With a slack of 1 it will match the latter, but not fox quick. A proximity
|
1184 |
With a slack of 1 it will match the latter, but not fox quick. A proximity
|
1200 |
search for quick fox with the default slack will match the latter, and
|
1185 |
search for quick fox with the default slack will match the latter, and
|
1201 |
also a fox is a cunning and quick animal.
|
1186 |
also a fox is a cunning and quick animal.
|
1202 |
|
1187 |
|
1203 |
----------------------------------------------------------------------
|
|
|
1204 |
|
|
|
1205 |
3.1.5.2. Avanced search: the "filter" tab
|
1188 |
3.1.6.2. Avanced search: the "filter" tab
|
1206 |
|
1189 |
|
1207 |
This part of the dialog has several sections which allow filtering the
|
1190 |
This part of the dialog has several sections which allow filtering the
|
1208 |
results of a search according to a number of criteria
|
1191 |
results of a search according to a number of criteria
|
1209 |
|
1192 |
|
1210 |
* The first section allows filtering by dates of last modification. You
|
1193 |
o The first section allows filtering by dates of last modification. You
|
1211 |
can specify both a minimum and a maximum date. The initial values are
|
1194 |
can specify both a minimum and a maximum date. The initial values are
|
1212 |
set according to the oldest and newest documents found in the index.
|
1195 |
set according to the oldest and newest documents found in the index.
|
1213 |
|
1196 |
|
1214 |
* The next section allows filtering the results by file size. There are
|
1197 |
o The next section allows filtering the results by file size. There are
|
1215 |
two entries for minimum and maximum size. Enter decimal numbers. You
|
1198 |
two entries for minimum and maximum size. Enter decimal numbers. You
|
1216 |
can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
|
1199 |
can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
|
1217 |
respectively.
|
1200 |
respectively.
|
1218 |
|
1201 |
|
1219 |
* The next section allows filtering the results by their mime types, or
|
1202 |
o The next section allows filtering the results by their mime types, or
|
1220 |
mime categories (ie: media/text/message/etc.).
|
1203 |
mime categories (ie: media/text/message/etc.).
|
1221 |
|
1204 |
|
1222 |
You can transfer the types between two boxes, to define which will be
|
1205 |
You can transfer the types between two boxes, to define which will be
|
1223 |
included or excluded by the search.
|
1206 |
included or excluded by the search.
|
1224 |
|
1207 |
|
1225 |
The state of the file type selection can be saved as the default (the
|
1208 |
The state of the file type selection can be saved as the default (the
|
1226 |
file type filter will not be activated at program start-up, but the
|
1209 |
file type filter will not be activated at program start-up, but the
|
1227 |
lists will be in the restored state).
|
1210 |
lists will be in the restored state).
|
1228 |
|
1211 |
|
1229 |
* The bottom section allows restricting the search results to a sub-tree
|
1212 |
o The bottom section allows restricting the search results to a sub-tree
|
1230 |
of the indexed area. You can use the Invert checkbox to search for
|
1213 |
of the indexed area. You can use the Invert checkbox to search for
|
1231 |
files not in the sub-tree instead. If you use directory filtering
|
1214 |
files not in the sub-tree instead. If you use directory filtering
|
1232 |
often and on big subsets of the file system, you may think of setting
|
1215 |
often and on big subsets of the file system, you may think of setting
|
1233 |
up multiple indexes instead, as the performance may be better.
|
1216 |
up multiple indexes instead, as the performance may be better.
|
1234 |
|
1217 |
|
1235 |
You can use relative/partial paths for filtering. Ie, entering
|
1218 |
You can use relative/partial paths for filtering. Ie, entering
|
1236 |
dirA/dirB would match either /dir1/dirA/dirB/myfile1 or
|
1219 |
dirA/dirB would match either /dir1/dirA/dirB/myfile1 or
|
1237 |
/dir2/dirA/dirB/someother/myfile2.
|
1220 |
/dir2/dirA/dirB/someother/myfile2.
|
1238 |
|
1221 |
|
1239 |
----------------------------------------------------------------------
|
|
|
1240 |
|
|
|
1241 |
3.1.5.3. Avanced search history
|
1222 |
3.1.6.3. Avanced search history
|
1242 |
|
1223 |
|
1243 |
The advanced search tool memorizes the last 100 searches performed. You
|
1224 |
The advanced search tool memorizes the last 100 searches performed. You
|
1244 |
can walk the saved searches by using the up and down arrow keys while the
|
1225 |
can walk the saved searches by using the up and down arrow keys while the
|
1245 |
keyboard focus belongs to the advanced search dialog.
|
1226 |
keyboard focus belongs to the advanced search dialog.
|
1246 |
|
1227 |
|
1247 |
The complex search history can be erased, along with the one for simple
|
1228 |
The complex search history can be erased, along with the one for simple
|
1248 |
search, by selecting the File->Erase Search History menu entry.
|
1229 |
search, by selecting the File -> Erase Search History menu entry.
|
1249 |
|
1230 |
|
1250 |
----------------------------------------------------------------------
|
|
|
1251 |
|
|
|
1252 |
3.1.6. The term explorer tool
|
1231 |
3.1.7. The term explorer tool
|
1253 |
|
1232 |
|
1254 |
Recoll automatically manages the expansion of search terms to their
|
1233 |
Recoll automatically manages the expansion of search terms to their
|
1255 |
derivatives (ie: plural/singular, verb inflections). But there are other
|
1234 |
derivatives (ie: plural/singular, verb inflections). But there are other
|
1256 |
cases where the exact search term is not known. For example, you may not
|
1235 |
cases where the exact search term is not known. For example, you may not
|
1257 |
remember the exact spelling, or only know the beginning of the name.
|
1236 |
remember the exact spelling, or only know the beginning of the name.
|
|
... |
|
... |
1300 |
|
1279 |
|
1301 |
Double-clicking on a term in the result list will insert it into the
|
1280 |
Double-clicking on a term in the result list will insert it into the
|
1302 |
simple search entry field. You can also cut/paste between the result list
|
1281 |
simple search entry field. You can also cut/paste between the result list
|
1303 |
and any entry field (the end of lines will be taken care of).
|
1282 |
and any entry field (the end of lines will be taken care of).
|
1304 |
|
1283 |
|
1305 |
----------------------------------------------------------------------
|
|
|
1306 |
|
|
|
1307 |
3.1.7. Multiple indexes
|
1284 |
3.1.8. Multiple indexes
|
1308 |
|
1285 |
|
1309 |
See the section describing the use of multiple indexes for generalities.
|
1286 |
See the section describing the use of multiple indexes for generalities.
|
1310 |
Only the aspects concerning the recoll GUI are described here.
|
1287 |
Only the aspects concerning the recoll GUI are described here.
|
1311 |
|
1288 |
|
1312 |
A recoll program instance is always associated with a specific index,
|
1289 |
A recoll program instance is always associated with a specific index,
|
|
... |
|
... |
1343 |
|
1320 |
|
1344 |
RECOLL_ACTIVE_EXTRA_DBS is available for Recoll versions 1.17.2 and later.
|
1321 |
RECOLL_ACTIVE_EXTRA_DBS is available for Recoll versions 1.17.2 and later.
|
1345 |
A change was made in the same update so that recoll will automatically
|
1322 |
A change was made in the same update so that recoll will automatically
|
1346 |
deactivate unreachable indexes when starting up.
|
1323 |
deactivate unreachable indexes when starting up.
|
1347 |
|
1324 |
|
1348 |
----------------------------------------------------------------------
|
|
|
1349 |
|
|
|
1350 |
3.1.8. Document history
|
1325 |
3.1.9. Document history
|
1351 |
|
1326 |
|
1352 |
Documents that you actually view (with the internal preview or an external
|
1327 |
Documents that you actually view (with the internal preview or an external
|
1353 |
tool) are entered into the document history, which is remembered.
|
1328 |
tool) are entered into the document history, which is remembered.
|
1354 |
|
1329 |
|
1355 |
You can display the history list by using the Tools/Doc History menu
|
1330 |
You can display the history list by using the Tools/Doc History menu
|
1356 |
entry.
|
1331 |
entry.
|
1357 |
|
1332 |
|
1358 |
You can erase the document history by using the Erase document history
|
1333 |
You can erase the document history by using the Erase document history
|
1359 |
entry in the File menu.
|
1334 |
entry in the File menu.
|
1360 |
|
1335 |
|
1361 |
----------------------------------------------------------------------
|
|
|
1362 |
|
|
|
1363 |
3.1.9. Sorting search results and collapsing duplicates
|
1336 |
3.1.10. Sorting search results and collapsing duplicates
|
1364 |
|
1337 |
|
1365 |
The documents in a result list are normally sorted in order of relevance.
|
1338 |
The documents in a result list are normally sorted in order of relevance.
|
1366 |
It is possible to specify a different sort order, either by using the
|
1339 |
It is possible to specify a different sort order, either by using the
|
1367 |
vertical arrows in the GUI toolbox to sort by date, or switching to the
|
1340 |
vertical arrows in the GUI toolbox to sort by date, or switching to the
|
1368 |
result table display and clicking on any header. The sort order chosen
|
1341 |
result table display and clicking on any header. The sort order chosen
|
|
... |
|
... |
1380 |
identity is based on an MD5 hash of the document container, not only of
|
1353 |
identity is based on an MD5 hash of the document container, not only of
|
1381 |
the text contents (so that ie, a text document with an image added will
|
1354 |
the text contents (so that ie, a text document with an image added will
|
1382 |
not be a duplicate of the text only). Duplicates hiding is controlled by
|
1355 |
not be a duplicate of the text only). Duplicates hiding is controlled by
|
1383 |
an entry in the GUI configuration dialog, and is off by default.
|
1356 |
an entry in the GUI configuration dialog, and is off by default.
|
1384 |
|
1357 |
|
1385 |
----------------------------------------------------------------------
|
|
|
1386 |
|
|
|
1387 |
3.1.10. Search tips, shortcuts
|
1358 |
3.1.11. Search tips, shortcuts
|
1388 |
|
1359 |
|
1389 |
3.1.10.1. Terms and search expansion
|
1360 |
3.1.11.1. Terms and search expansion
|
1390 |
|
1361 |
|
1391 |
Term completion. Typing Esc Space in the simple search entry field while
|
1362 |
Term completion. Typing Esc Space in the simple search entry field while
|
1392 |
entering a word will either complete the current word if its beginning
|
1363 |
entering a word will either complete the current word if its beginning
|
1393 |
matches a unique term in the index, or open a window to propose a list of
|
1364 |
matches a unique term in the index, or open a window to propose a list of
|
1394 |
completions.
|
1365 |
completions.
|
|
... |
|
... |
1421 |
index all directories in the file path as terms. This has been abandoned
|
1392 |
index all directories in the file path as terms. This has been abandoned
|
1422 |
as it did not seem really useful). Alternatively, you can use the specific
|
1393 |
as it did not seem really useful). Alternatively, you can use the specific
|
1423 |
file name search which will only look for file names, and may be faster
|
1394 |
file name search which will only look for file names, and may be faster
|
1424 |
than the generic search especially when using wildcards.
|
1395 |
than the generic search especially when using wildcards.
|
1425 |
|
1396 |
|
1426 |
----------------------------------------------------------------------
|
|
|
1427 |
|
|
|
1428 |
3.1.10.2. Working with phrases and proximity
|
1397 |
3.1.11.2. Working with phrases and proximity
|
1429 |
|
1398 |
|
1430 |
Phrases and Proximity searches. A phrase can be looked for by enclosing it
|
1399 |
Phrases and Proximity searches. A phrase can be looked for by enclosing it
|
1431 |
in double quotes. Example: "user manual" will look only for occurrences of
|
1400 |
in double quotes. Example: "user manual" will look only for occurrences of
|
1432 |
user immediately followed by manual. You can use the This phrase field of
|
1401 |
user immediately followed by manual. You can use the This phrase field of
|
1433 |
the advanced search dialog to the same effect. Phrases can be entered
|
1402 |
the advanced search dialog to the same effect. Phrases can be entered
|
|
... |
|
... |
1453 |
IBM. Searching for the word inside a phrase (ie: "the IBM company") will
|
1422 |
IBM. Searching for the word inside a phrase (ie: "the IBM company") will
|
1454 |
only match the dotted abrreviation if you increase the phrase slack (using
|
1423 |
only match the dotted abrreviation if you increase the phrase slack (using
|
1455 |
the advanced search panel control, or the o query language modifier).
|
1424 |
the advanced search panel control, or the o query language modifier).
|
1456 |
Literal occurences of the word will be matched normally.
|
1425 |
Literal occurences of the word will be matched normally.
|
1457 |
|
1426 |
|
1458 |
----------------------------------------------------------------------
|
|
|
1459 |
|
|
|
1460 |
3.1.10.3. Others
|
1427 |
3.1.11.3. Others
|
1461 |
|
1428 |
|
1462 |
Using fields. You can use the query language and field specifications to
|
1429 |
Using fields. You can use the query language and field specifications to
|
1463 |
only search certain parts of documents. This can be especially helpful
|
1430 |
only search certain parts of documents. This can be especially helpful
|
1464 |
with email, for example only searching emails from a specific originator:
|
1431 |
with email, for example only searching emails from a specific originator:
|
1465 |
search tips from:helpfulgui
|
1432 |
search tips from:helpfulgui
|
|
... |
|
... |
1499 |
Printing previews. Entering Ctrl-P in a preview window will print the
|
1466 |
Printing previews. Entering Ctrl-P in a preview window will print the
|
1500 |
currently displayed text.
|
1467 |
currently displayed text.
|
1501 |
|
1468 |
|
1502 |
Quitting. Entering Ctrl-Q almost anywhere will close the application.
|
1469 |
Quitting. Entering Ctrl-Q almost anywhere will close the application.
|
1503 |
|
1470 |
|
1504 |
----------------------------------------------------------------------
|
|
|
1505 |
|
|
|
1506 |
3.1.11. Customizing the search interface
|
1471 |
3.1.12. Customizing the search interface
|
1507 |
|
1472 |
|
1508 |
You can customize some aspects of the search interface by using the GUI
|
1473 |
You can customize some aspects of the search interface by using the GUI
|
1509 |
configuration entry in the Preferences menu.
|
1474 |
configuration entry in the Preferences menu.
|
1510 |
|
1475 |
|
1511 |
There are several tabs in the dialog, dealing with the interface itself,
|
1476 |
There are several tabs in the dialog, dealing with the interface itself,
|
1512 |
the parameters used for searching and returning results, and what indexes
|
1477 |
the parameters used for searching and returning results, and what indexes
|
1513 |
are searched.
|
1478 |
are searched.
|
1514 |
|
1479 |
|
1515 |
User interface parameters:
|
1480 |
User interface parameters:
|
1516 |
|
1481 |
|
1517 |
* Highlight color for query terms: Terms from the user query are
|
1482 |
o Highlight color for query terms: Terms from the user query are
|
1518 |
highlighted in the result list samples and the preview window. The
|
1483 |
highlighted in the result list samples and the preview window. The
|
1519 |
color can be chosen here. Any Qt color string should work (ie red,
|
1484 |
color can be chosen here. Any Qt color string should work (ie red,
|
1520 |
#ff0000). The default is blue.
|
1485 |
#ff0000). The default is blue.
|
1521 |
|
1486 |
|
1522 |
* Style sheet: The name of a Qt style sheet text file which is applied
|
1487 |
o Style sheet: The name of a Qt style sheet text file which is applied
|
1523 |
to the whole Recoll application on startup. The default value is
|
1488 |
to the whole Recoll application on startup. The default value is
|
1524 |
empty, but there is a skeleton style sheet (recoll.qss) inside the
|
1489 |
empty, but there is a skeleton style sheet (recoll.qss) inside the
|
1525 |
/usr/share/recoll/examples directory. Using a style sheet, you can
|
1490 |
/usr/share/recoll/examples directory. Using a style sheet, you can
|
1526 |
change most recoll graphical parameters: colors, fonts, etc. See the
|
1491 |
change most recoll graphical parameters: colors, fonts, etc. See the
|
1527 |
sample file for a few simple examples.
|
1492 |
sample file for a few simple examples.
|
1528 |
|
1493 |
|
1529 |
* Maximum text size highlighted for preview Inserting highlights on
|
1494 |
o Maximum text size highlighted for preview Inserting highlights on
|
1530 |
search term inside the text before inserting it in the preview window
|
1495 |
search term inside the text before inserting it in the preview window
|
1531 |
involves quite a lot of processing, and can be disabled over the given
|
1496 |
involves quite a lot of processing, and can be disabled over the given
|
1532 |
text size to speed up loading.
|
1497 |
text size to speed up loading.
|
1533 |
|
1498 |
|
1534 |
* Prefer HTML to plain text for preview if set, Recoll will display HTML
|
1499 |
o Prefer HTML to plain text for preview if set, Recoll will display HTML
|
1535 |
as such inside the preview window. If this causes problems with the Qt
|
1500 |
as such inside the preview window. If this causes problems with the Qt
|
1536 |
HTML display, you can uncheck it to display the plain text version
|
1501 |
HTML display, you can uncheck it to display the plain text version
|
1537 |
instead.
|
1502 |
instead.
|
1538 |
|
1503 |
|
1539 |
* Plain text to HTML line style: when displaying plain text inside the
|
1504 |
o Plain text to HTML line style: when displaying plain text inside the
|
1540 |
preview window, Recoll tries to preserve some of the original text
|
1505 |
preview window, Recoll tries to preserve some of the original text
|
1541 |
line breaks and indentation. It can either use PRE HTML tags, which
|
1506 |
line breaks and indentation. It can either use PRE HTML tags, which
|
1542 |
will well preserve the indentation but will force horizontal scrolling
|
1507 |
will well preserve the indentation but will force horizontal scrolling
|
1543 |
for long lines, or use BR tags to break at the original line breaks,
|
1508 |
for long lines, or use BR tags to break at the original line breaks,
|
1544 |
which will let the editor introduce other line breaks according to the
|
1509 |
which will let the editor introduce other line breaks according to the
|
1545 |
window width, but will lose some of the original indentation. The
|
1510 |
window width, but will lose some of the original indentation. The
|
1546 |
third option has been available in recent releases and is probably now
|
1511 |
third option has been available in recent releases and is probably now
|
1547 |
the best one: use PRE tags with line wrapping.
|
1512 |
the best one: use PRE tags with line wrapping.
|
1548 |
|
1513 |
|
1549 |
* Use desktop preferences to choose document editor: if this is checked,
|
1514 |
o Use desktop preferences to choose document editor: if this is checked,
|
1550 |
the xdg-open utility will be used to open files when you click the
|
1515 |
the xdg-open utility will be used to open files when you click the
|
1551 |
Open link in the result list, instead of the application defined in
|
1516 |
Open link in the result list, instead of the application defined in
|
1552 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
1517 |
mimeview. xdg-open will in term use your desktop preferences to choose
|
1553 |
an appropriate application.
|
1518 |
an appropriate application.
|
1554 |
|
1519 |
|
1555 |
* Exceptions: when using the desktop preferences for opening documents,
|
1520 |
o Exceptions: when using the desktop preferences for opening documents,
|
1556 |
these are mime types that will still be opened according to Recoll
|
1521 |
these are mime types that will still be opened according to Recoll
|
1557 |
preferences. This is useful for passing parameters like page numbers
|
1522 |
preferences. This is useful for passing parameters like page numbers
|
1558 |
or search strings to applications that support them (e.g. evince).
|
1523 |
or search strings to applications that support them (e.g. evince).
|
1559 |
This cannot be done with xdg-open which only supports passing one
|
1524 |
This cannot be done with xdg-open which only supports passing one
|
1560 |
parameter.
|
1525 |
parameter.
|
1561 |
|
1526 |
|
1562 |
* Choose editor applications this will let you choose the command
|
1527 |
o Choose editor applications this will let you choose the command
|
1563 |
started by the Open links inside the result list, for specific
|
1528 |
started by the Open links inside the result list, for specific
|
1564 |
document types.
|
1529 |
document types.
|
1565 |
|
1530 |
|
1566 |
* Display category filter as toolbar... this will let you choose if the
|
1531 |
o Display category filter as toolbar... this will let you choose if the
|
1567 |
document categories are displayed as a list or a set of buttons.
|
1532 |
document categories are displayed as a list or a set of buttons.
|
1568 |
|
1533 |
|
1569 |
* Auto-start simple search on white space entry: if this is checked, a
|
1534 |
o Auto-start simple search on white space entry: if this is checked, a
|
1570 |
search will be executed each time you enter a space in the simple
|
1535 |
search will be executed each time you enter a space in the simple
|
1571 |
search input field. This lets you look at the result list as you enter
|
1536 |
search input field. This lets you look at the result list as you enter
|
1572 |
new terms. This is off by default, you may like it or not...
|
1537 |
new terms. This is off by default, you may like it or not...
|
1573 |
|
1538 |
|
1574 |
* Start with advanced search dialog open : If you use this dialog
|
1539 |
o Start with advanced search dialog open : If you use this dialog
|
1575 |
frequently, checking the entries will get it to open when recoll
|
1540 |
frequently, checking the entries will get it to open when recoll
|
1576 |
starts.
|
1541 |
starts.
|
1577 |
|
1542 |
|
1578 |
* Remember sort activation state if set, Recoll will remember the sort
|
1543 |
o Remember sort activation state if set, Recoll will remember the sort
|
1579 |
tool stat between invocations. It normally starts with sorting
|
1544 |
tool stat between invocations. It normally starts with sorting
|
1580 |
disabled.
|
1545 |
disabled.
|
1581 |
|
1546 |
|
1582 |
Result list parameters:
|
1547 |
Result list parameters:
|
1583 |
|
1548 |
|
1584 |
* Number of results in a result page
|
1549 |
o Number of results in a result page
|
1585 |
|
1550 |
|
1586 |
* Result list font: There is quite a lot of information shown in the
|
1551 |
o Result list font: There is quite a lot of information shown in the
|
1587 |
result list, and you may want to customize the font and/or font size.
|
1552 |
result list, and you may want to customize the font and/or font size.
|
1588 |
The rest of the fonts used by Recoll are determined by your generic Qt
|
1553 |
The rest of the fonts used by Recoll are determined by your generic Qt
|
1589 |
config (try the qtconfig command).
|
1554 |
config (try the qtconfig command).
|
1590 |
|
1555 |
|
1591 |
* Edit result list paragraph format string: allows you to change the
|
1556 |
o Edit result list paragraph format string: allows you to change the
|
1592 |
presentation of each result list entry. See the result list
|
1557 |
presentation of each result list entry. See the result list
|
1593 |
customisation section.
|
1558 |
customisation section.
|
1594 |
|
1559 |
|
1595 |
* Edit result page HTML header insert: allows you to define text
|
1560 |
o Edit result page HTML header insert: allows you to define text
|
1596 |
inserted at the end of the result page HTML header. More detail in the
|
1561 |
inserted at the end of the result page HTML header. More detail in the
|
1597 |
result list customisation section.
|
1562 |
result list customisation section.
|
1598 |
|
1563 |
|
1599 |
* Date format: allows specifying the format used for displaying dates
|
1564 |
o Date format: allows specifying the format used for displaying dates
|
1600 |
inside the result list. This should be specified as an strftime()
|
1565 |
inside the result list. This should be specified as an strftime()
|
1601 |
string (man strftime).
|
1566 |
string (man strftime).
|
1602 |
|
1567 |
|
1603 |
* Abstract snippet separator: for synthetic abstracts built from index
|
1568 |
o Abstract snippet separator: for synthetic abstracts built from index
|
1604 |
data, which are usually made of several snippets from different parts
|
1569 |
data, which are usually made of several snippets from different parts
|
1605 |
of the document, this defines the snippet separator, an ellipsis by
|
1570 |
of the document, this defines the snippet separator, an ellipsis by
|
1606 |
default.
|
1571 |
default.
|
1607 |
|
1572 |
|
1608 |
Search parameters:
|
1573 |
Search parameters:
|
1609 |
|
1574 |
|
1610 |
* Hide duplicate results: decides if result list entries are shown for
|
1575 |
o Hide duplicate results: decides if result list entries are shown for
|
1611 |
identical documents found in different places.
|
1576 |
identical documents found in different places.
|
1612 |
|
1577 |
|
1613 |
* Stemming language: stemming obviously depends on the document's
|
1578 |
o Stemming language: stemming obviously depends on the document's
|
1614 |
language. This listbox will let you chose among the stemming databases
|
1579 |
language. This listbox will let you chose among the stemming databases
|
1615 |
which were built during indexing (this is set in the main
|
1580 |
which were built during indexing (this is set in the main
|
1616 |
configuration file), or later added with recollindex -s (See the
|
1581 |
configuration file), or later added with recollindex -s (See the
|
1617 |
recollindex manual). Stemming languages which are dynamically added
|
1582 |
recollindex manual). Stemming languages which are dynamically added
|
1618 |
will be deleted at the next indexing pass unless they are also added
|
1583 |
will be deleted at the next indexing pass unless they are also added
|
1619 |
in the configuration file.
|
1584 |
in the configuration file.
|
1620 |
|
1585 |
|
1621 |
* Automatically add phrase to simple searches: a phrase will be
|
1586 |
o Automatically add phrase to simple searches: a phrase will be
|
1622 |
automatically built and added to simple searches when looking for Any
|
1587 |
automatically built and added to simple searches when looking for Any
|
1623 |
terms. This will give a relevance boost to the results where the
|
1588 |
terms. This will give a relevance boost to the results where the
|
1624 |
search terms appear as a phrase (consecutive and in order).
|
1589 |
search terms appear as a phrase (consecutive and in order).
|
1625 |
|
1590 |
|
1626 |
* Autophrase term frequency threshold percentage: very frequent terms
|
1591 |
o Autophrase term frequency threshold percentage: very frequent terms
|
1627 |
should not be included in automatic phrase searches for performance
|
1592 |
should not be included in automatic phrase searches for performance
|
1628 |
reasons. The parameter defines the cutoff percentage (percentage of
|
1593 |
reasons. The parameter defines the cutoff percentage (percentage of
|
1629 |
the documents where the term appears).
|
1594 |
the documents where the term appears).
|
1630 |
|
1595 |
|
1631 |
* Replace abstracts from documents: this decides if we should synthesize
|
1596 |
o Replace abstracts from documents: this decides if we should synthesize
|
1632 |
and display an abstract in place of an explicit abstract found within
|
1597 |
and display an abstract in place of an explicit abstract found within
|
1633 |
the document itself.
|
1598 |
the document itself.
|
1634 |
|
1599 |
|
1635 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
1600 |
o Dynamically build abstracts: this decides if Recoll tries to build
|
1636 |
document abstracts (lists of snippets) when displaying the result
|
1601 |
document abstracts (lists of snippets) when displaying the result
|
1637 |
list. Abstracts are constructed by taking context from the document
|
1602 |
list. Abstracts are constructed by taking context from the document
|
1638 |
information, around the search terms.
|
1603 |
information, around the search terms.
|
1639 |
|
1604 |
|
1640 |
* Synthetic abstract size: adjust to taste...
|
1605 |
o Synthetic abstract size: adjust to taste...
|
1641 |
|
1606 |
|
1642 |
* Synthetic abstract context words: how many words should be displayed
|
1607 |
o Synthetic abstract context words: how many words should be displayed
|
1643 |
around each term occurrence.
|
1608 |
around each term occurrence.
|
1644 |
|
1609 |
|
1645 |
* Query language magic file name suffixes: a list of words which
|
1610 |
o Query language magic file name suffixes: a list of words which
|
1646 |
automatically get turned into ext:xxx file name suffix clauses when
|
1611 |
automatically get turned into ext:xxx file name suffix clauses when
|
1647 |
starting a query language query (ie: doc xls xlsx...). This will save
|
1612 |
starting a query language query (ie: doc xls xlsx...). This will save
|
1648 |
some typing for people who use file types a lot when querying.
|
1613 |
some typing for people who use file types a lot when querying.
|
1649 |
|
1614 |
|
1650 |
External indexes: This panel will let you browse for additional indexes
|
1615 |
External indexes: This panel will let you browse for additional indexes
|
|
... |
|
... |
1660 |
always implicitly active. If this is not desirable, you can set up your
|
1625 |
always implicitly active. If this is not desirable, you can set up your
|
1661 |
configuration so that it indexes, for example, an empty directory. An
|
1626 |
configuration so that it indexes, for example, an empty directory. An
|
1662 |
alternative indexer may also need to implement a way of purging the index
|
1627 |
alternative indexer may also need to implement a way of purging the index
|
1663 |
from stale data,
|
1628 |
from stale data,
|
1664 |
|
1629 |
|
1665 |
----------------------------------------------------------------------
|
|
|
1666 |
|
|
|
1667 |
3.1.11.1. The result list format
|
1630 |
3.1.12.1. The result list format
|
1668 |
|
1631 |
|
1669 |
The result list presentation can be exhaustively customized by adjusting
|
1632 |
The result list presentation can be exhaustively customized by adjusting
|
1670 |
two elements:
|
1633 |
two elements:
|
1671 |
|
1634 |
|
1672 |
* The paragraph format
|
1635 |
o The paragraph format
|
1673 |
|
1636 |
|
1674 |
* HTML code inside the header section
|
1637 |
o HTML code inside the header section
|
1675 |
|
1638 |
|
1676 |
These can be edited from the Result list tab of the GUI configuration.
|
1639 |
These can be edited from the Result list tab of the GUI configuration.
|
1677 |
|
1640 |
|
1678 |
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
1641 |
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
1679 |
(this may be disabled at build time), and total customisation is possible
|
1642 |
(this may be disabled at build time), and total customisation is possible
|
|
... |
|
... |
1686 |
WebKit build), if there are restrictions to what you can do, they are
|
1649 |
WebKit build), if there are restrictions to what you can do, they are
|
1687 |
beyond this author's HTML/CSS/Javascript abilities... There are a few
|
1650 |
beyond this author's HTML/CSS/Javascript abilities... There are a few
|
1688 |
examples on the page about customising the result list on the Recoll web
|
1651 |
examples on the page about customising the result list on the Recoll web
|
1689 |
site.
|
1652 |
site.
|
1690 |
|
1653 |
|
1691 |
----------------------------------------------------------------------
|
|
|
1692 |
|
|
|
1693 |
3.1.11.1.1. The paragraph format
|
1654 |
The paragraph format
|
1694 |
|
1655 |
|
1695 |
This is an arbitrary HTML string where the following printf-like %
|
1656 |
This is an arbitrary HTML string where the following printf-like %
|
1696 |
substitutions will be performed:
|
1657 |
substitutions will be performed:
|
1697 |
|
1658 |
|
1698 |
* %A. Abstract
|
1659 |
o %A. Abstract
|
1699 |
|
1660 |
|
1700 |
* %D. Date
|
1661 |
o %D. Date
|
1701 |
|
1662 |
|
1702 |
* %I. Icon image name. This is normally determined from the mime type.
|
1663 |
o %I. Icon image name. This is normally determined from the mime type.
|
1703 |
The associations are defined inside the mimeconf configuration file.
|
1664 |
The associations are defined inside the mimeconf configuration file.
|
1704 |
If a thumbnail for the file is found at the standard Freedesktop
|
1665 |
If a thumbnail for the file is found at the standard Freedesktop
|
1705 |
location, this will be displayed instead.
|
1666 |
location, this will be displayed instead.
|
1706 |
|
1667 |
|
1707 |
* %K. Keywords (if any)
|
1668 |
o %K. Keywords (if any)
|
1708 |
|
1669 |
|
1709 |
* %L. Precooked Preview, Edit, and possibly Snippets links
|
1670 |
o %L. Precooked Preview, Edit, and possibly Snippets links
|
1710 |
|
1671 |
|
1711 |
* %M. Mime type
|
1672 |
o %M. Mime type
|
1712 |
|
1673 |
|
1713 |
* %N. result Number inside the result page
|
1674 |
o %N. result Number inside the result page
|
1714 |
|
1675 |
|
1715 |
* %R. Relevance percentage
|
1676 |
o %R. Relevance percentage
|
1716 |
|
1677 |
|
1717 |
* %S. Size information
|
1678 |
o %S. Size information
|
1718 |
|
1679 |
|
1719 |
* %T. Title or Filename if not set.
|
1680 |
o %T. Title or Filename if not set.
|
1720 |
|
1681 |
|
1721 |
* %t. Title or Filename if not set.
|
1682 |
o %t. Title or Filename if not set.
|
1722 |
|
1683 |
|
1723 |
* %U. Url
|
1684 |
o %U. Url
|
1724 |
|
1685 |
|
1725 |
The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
|
1686 |
The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
|
1726 |
href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
|
1687 |
href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
|
1727 |
number inside the result page).
|
1688 |
number inside the result page).
|
1728 |
|
1689 |
|
|
... |
|
... |
1763 |
how they look.
|
1724 |
how they look.
|
1764 |
|
1725 |
|
1765 |
It is also possible to define the value of the snippet separator inside
|
1726 |
It is also possible to define the value of the snippet separator inside
|
1766 |
the abstract section.
|
1727 |
the abstract section.
|
1767 |
|
1728 |
|
1768 |
----------------------------------------------------------------------
|
|
|
1769 |
|
|
|
1770 |
3.2. Searching with the KDE KIO slave
|
1729 |
3.2. Searching with the KDE KIO slave
|
1771 |
|
1730 |
|
1772 |
3.2.1. What's this
|
1731 |
3.2.1. What's this
|
1773 |
|
1732 |
|
1774 |
The Recoll KIO slave allows performing a Recoll search by entering an
|
1733 |
The Recoll KIO slave allows performing a Recoll search by entering an
|
|
... |
|
... |
1791 |
|
1750 |
|
1792 |
The instructions for building this module are located in the source tree.
|
1751 |
The instructions for building this module are located in the source tree.
|
1793 |
See: kde/kio/recoll/00README.txt. Some Linux distributions do package the
|
1752 |
See: kde/kio/recoll/00README.txt. Some Linux distributions do package the
|
1794 |
kio-recoll module, so check before diving into the build process, maybe
|
1753 |
kio-recoll module, so check before diving into the build process, maybe
|
1795 |
it's already out there ready for one-click installation.
|
1754 |
it's already out there ready for one-click installation.
|
1796 |
|
|
|
1797 |
----------------------------------------------------------------------
|
|
|
1798 |
|
1755 |
|
1799 |
3.2.2. Searchable documents
|
1756 |
3.2.2. Searchable documents
|
1800 |
|
1757 |
|
1801 |
As a sample application, the Recoll KIO slave could allow preparing a set
|
1758 |
As a sample application, the Recoll KIO slave could allow preparing a set
|
1802 |
of HTML documents (for example a manual) so that they become their own
|
1759 |
of HTML documents (for example a manual) so that they become their own
|
|
... |
|
... |
1815 |
}
|
1772 |
}
|
1816 |
</script>
|
1773 |
</script>
|
1817 |
....
|
1774 |
....
|
1818 |
<body ondblclick="recollsearch()">
|
1775 |
<body ondblclick="recollsearch()">
|
1819 |
|
1776 |
|
1820 |
----------------------------------------------------------------------
|
|
|
1821 |
|
1777 |
|
1822 |
3.3. Searching on the command line
|
1778 |
3.3. Searching on the command line
|
1823 |
|
1779 |
|
1824 |
There are several ways to obtain search results as a text stream, without
|
1780 |
There are several ways to obtain search results as a text stream, without
|
1825 |
a graphical interface:
|
1781 |
a graphical interface:
|
1826 |
|
1782 |
|
1827 |
* By passing option -t to the recoll program.
|
1783 |
o By passing option -t to the recoll program.
|
1828 |
|
1784 |
|
1829 |
* By using the recollq program.
|
1785 |
o By using the recollq program.
|
1830 |
|
1786 |
|
1831 |
* By writing a custom Python program, using the Recoll Python API.
|
1787 |
o By writing a custom Python program, using the Recoll Python API.
|
1832 |
|
1788 |
|
1833 |
The first two methods work in the same way and accept/need the same
|
1789 |
The first two methods work in the same way and accept/need the same
|
1834 |
arguments (except for the additional -t to recoll). The query to be
|
1790 |
arguments (except for the additional -t to recoll). The query to be
|
1835 |
executed is specified as command line arguments.
|
1791 |
executed is specified as command line arguments.
|
1836 |
|
1792 |
|
|
... |
|
... |
1884 |
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes
|
1840 |
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes
|
1885 |
text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
|
1841 |
text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
|
1886 |
text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]...
|
1842 |
text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]...
|
1887 |
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
|
1843 |
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
|
1888 |
|
1844 |
|
1889 |
----------------------------------------------------------------------
|
|
|
1890 |
|
|
|
1891 |
3.4. The query language
|
1845 |
3.4. The query language
|
1892 |
|
1846 |
|
1893 |
The query language processor is activated in the GUI simple search entry
|
1847 |
The query language processor is activated in the GUI simple search entry
|
1894 |
when the search mode selector is set to Query Language. It can also be
|
1848 |
when the search mode selector is set to Query Language. It can also be
|
1895 |
used with the KIO slave or the command line search. It broadly has the
|
1849 |
used with the KIO slave or the command line search. It broadly has the
|
|
... |
|
... |
1917 |
An element is composed of an optional field specification, and a value,
|
1871 |
An element is composed of an optional field specification, and a value,
|
1918 |
separated by a colon (the field separator is the last colon in the
|
1872 |
separated by a colon (the field separator is the last colon in the
|
1919 |
element). Example: Eugenie, author:balzac, dc:title:grandet
|
1873 |
element). Example: Eugenie, author:balzac, dc:title:grandet
|
1920 |
|
1874 |
|
1921 |
The colon, if present, means "contains". Xesam defines other relations,
|
1875 |
The colon, if present, means "contains". Xesam defines other relations,
|
1922 |
which are mostly supported for now (except in special cases, described
|
1876 |
which are mostly unsupported for now (except in special cases, described
|
1923 |
further down).
|
1877 |
further down).
|
1924 |
|
1878 |
|
1925 |
All elements in the search entry are normally combined with an implicit
|
1879 |
All elements in the search entry are normally combined with an implicit
|
1926 |
AND. It is possible to specify that elements be OR'ed instead, as in
|
1880 |
AND. It is possible to specify that elements be OR'ed instead, as in
|
1927 |
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
1881 |
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
|
... |
|
... |
1939 |
Modifiers can be set on a phrase clause, for example to specify a
|
1893 |
Modifiers can be set on a phrase clause, for example to specify a
|
1940 |
proximity search (unordered). See the modifier section.
|
1894 |
proximity search (unordered). See the modifier section.
|
1941 |
|
1895 |
|
1942 |
Recoll currently manages the following default fields:
|
1896 |
Recoll currently manages the following default fields:
|
1943 |
|
1897 |
|
1944 |
* title, subject or caption are synonyms which specify data to be
|
1898 |
o title, subject or caption are synonyms which specify data to be
|
1945 |
searched for in the document title or subject.
|
1899 |
searched for in the document title or subject.
|
1946 |
|
1900 |
|
1947 |
* author or from for searching the documents originators.
|
1901 |
o author or from for searching the documents originators.
|
1948 |
|
1902 |
|
1949 |
* recipient or to for searching the documents recipients.
|
1903 |
o recipient or to for searching the documents recipients.
|
1950 |
|
1904 |
|
1951 |
* keyword for searching the document-specified keywords (few documents
|
1905 |
o keyword for searching the document-specified keywords (few documents
|
1952 |
actually have any).
|
1906 |
actually have any).
|
1953 |
|
1907 |
|
1954 |
* filename for the document's file name.
|
1908 |
o filename for the document's file name.
|
1955 |
|
1909 |
|
1956 |
* ext specifies the file name extension (Ex: ext:html)
|
1910 |
o ext specifies the file name extension (Ex: ext:html)
|
1957 |
|
1911 |
|
1958 |
The field syntax also supports a few field-like, but special, criteria:
|
1912 |
The field syntax also supports a few field-like, but special, criteria:
|
1959 |
|
1913 |
|
1960 |
* dir for filtering the results on file location (Ex:
|
1914 |
o dir for filtering the results on file location (Ex:
|
1961 |
dir:/home/me/somedir). -dir also works to find results not in the
|
1915 |
dir:/home/me/somedir). -dir also works to find results not in the
|
1962 |
specified directory (release >= 1.15.8). A tilde inside the value will
|
1916 |
specified directory (release >= 1.15.8). A tilde inside the value will
|
1963 |
be expanded to the home directory. Wildcards will not be expanded. You
|
1917 |
be expanded to the home directory. Wildcards will not be expanded. You
|
1964 |
cannot use OR with dir clauses (this restriction may go away in the
|
1918 |
cannot use OR with dir clauses (this restriction may go away in the
|
1965 |
future).
|
1919 |
future).
|
|
... |
|
... |
1985 |
and are best avoided.
|
1939 |
and are best avoided.
|
1986 |
|
1940 |
|
1987 |
You need to use double-quotes around the path value if it contains
|
1941 |
You need to use double-quotes around the path value if it contains
|
1988 |
space characters.
|
1942 |
space characters.
|
1989 |
|
1943 |
|
1990 |
* size for filtering the results on file size. Example: size<10000. You
|
1944 |
o size for filtering the results on file size. Example: size<10000. You
|
1991 |
can use <, > or = as operators. You can specify a range like the
|
1945 |
can use <, > or = as operators. You can specify a range like the
|
1992 |
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
1946 |
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
1993 |
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
1947 |
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
1994 |
than 1000 bytes.
|
1948 |
than 1000 bytes.
|
1995 |
|
1949 |
|
1996 |
* date for searching or filtering on dates. The syntax for the argument
|
1950 |
o date for searching or filtering on dates. The syntax for the argument
|
1997 |
is based on the ISO8601 standard for dates and time intervals. Only
|
1951 |
is based on the ISO8601 standard for dates and time intervals. Only
|
1998 |
dates are supported, no times. The general syntax is 2 elements
|
1952 |
dates are supported, no times. The general syntax is 2 elements
|
1999 |
separated by a / character. Each element can be a date or a period of
|
1953 |
separated by a / character. Each element can be a date or a period of
|
2000 |
time. Periods are specified as PnYnMnD. The n numbers are the
|
1954 |
time. Periods are specified as PnYnMnD. The n numbers are the
|
2001 |
respective numbers of years, months or days, any of which may be
|
1955 |
respective numbers of years, months or days, any of which may be
|
2002 |
missing. Dates are specified as YYYY-MM-DD. The days and months parts
|
1956 |
missing. Dates are specified as YYYY-MM-DD. The days and months parts
|
2003 |
may be missing. If the / is present but an element is missing, the
|
1957 |
may be missing. If the / is present but an element is missing, the
|
2004 |
missing element is interpreted as the lowest or highest date in the
|
1958 |
missing element is interpreted as the lowest or highest date in the
|
2005 |
index. Examples:
|
1959 |
index. Examples:
|
2006 |
|
1960 |
|
2007 |
* 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
|
1961 |
o 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
|
2008 |
|
1962 |
|
2009 |
* 2001-03-01/P1Y2M the same specified with a period.
|
1963 |
o 2001-03-01/P1Y2M the same specified with a period.
|
2010 |
|
1964 |
|
2011 |
* 2001/ from the beginning of 2001 to the latest date in the index.
|
1965 |
o 2001/ from the beginning of 2001 to the latest date in the index.
|
2012 |
|
1966 |
|
2013 |
* 2001 the whole year of 2001
|
1967 |
o 2001 the whole year of 2001
|
2014 |
|
1968 |
|
2015 |
* P2D/ means 2 days ago up to now if there are no documents with
|
1969 |
o P2D/ means 2 days ago up to now if there are no documents with
|
2016 |
dates in the future.
|
1970 |
dates in the future.
|
2017 |
|
1971 |
|
2018 |
* /2003 all documents from 2003 or older.
|
1972 |
o /2003 all documents from 2003 or older.
|
2019 |
|
1973 |
|
2020 |
Periods can also be specified with small letters (ie: p2y).
|
1974 |
Periods can also be specified with small letters (ie: p2y).
|
2021 |
|
1975 |
|
2022 |
* mime or format for specifying the mime type. This one is quite special
|
1976 |
o mime or format for specifying the mime type. This one is quite special
|
2023 |
because you can specify several values which will be OR'ed (the normal
|
1977 |
because you can specify several values which will be OR'ed (the normal
|
2024 |
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
1978 |
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
2025 |
Specifying an explicit boolean operator before a mime specification is
|
1979 |
Specifying an explicit boolean operator before a mime specification is
|
2026 |
not supported and will produce strange results. You can filter out
|
1980 |
not supported and will produce strange results. You can filter out
|
2027 |
certain types by using negation (-mime:some/type), and you can use
|
1981 |
certain types by using negation (-mime:some/type), and you can use
|
2028 |
wildcards in the value (mime:text/*). Note that mime is the ONLY field
|
1982 |
wildcards in the value (mime:text/*). Note that mime is the ONLY field
|
2029 |
with an OR default. You do need to use OR with ext terms for example.
|
1983 |
with an OR default. You do need to use OR with ext terms for example.
|
2030 |
|
1984 |
|
2031 |
* type or rclcat for specifying the category (as in
|
1985 |
o type or rclcat for specifying the category (as in
|
2032 |
text/media/presentation/etc.). The classification of mime types in
|
1986 |
text/media/presentation/etc.). The classification of mime types in
|
2033 |
categories is defined in the Recoll configuration (mimeconf), and can
|
1987 |
categories is defined in the Recoll configuration (mimeconf), and can
|
2034 |
be modified or extended. The default category names are those which
|
1988 |
be modified or extended. The default category names are those which
|
2035 |
permit filtering results in the main GUI screen. Categories are OR'ed
|
1989 |
permit filtering results in the main GUI screen. Categories are OR'ed
|
2036 |
like mime types above. This can't be negated with - either.
|
1990 |
like mime types above. This can't be negated with - either.
|
|
... |
|
... |
2044 |
The document filters used while indexing have the possibility to create
|
1998 |
The document filters used while indexing have the possibility to create
|
2045 |
other fields with arbitrary names, and aliases may be defined in the
|
1999 |
other fields with arbitrary names, and aliases may be defined in the
|
2046 |
configuration, so that the exact field search possibilities may be
|
2000 |
configuration, so that the exact field search possibilities may be
|
2047 |
different for you if someone took care of the customisation.
|
2001 |
different for you if someone took care of the customisation.
|
2048 |
|
2002 |
|
2049 |
----------------------------------------------------------------------
|
|
|
2050 |
|
|
|
2051 |
3.4.1. Modifiers
|
2003 |
3.4.1. Modifiers
|
2052 |
|
2004 |
|
2053 |
Some characters are recognized as search modifiers when found immediately
|
2005 |
Some characters are recognized as search modifiers when found immediately
|
2054 |
after the closing double quote of a phrase, as in "some
|
2006 |
after the closing double quote of a phrase, as in "some
|
2055 |
term"modifierchars. The actual "phrase" can be a single term of course.
|
2007 |
term"modifierchars. The actual "phrase" can be a single term of course.
|
2056 |
Supported modifiers:
|
2008 |
Supported modifiers:
|
2057 |
|
2009 |
|
2058 |
* l can be used to turn off stemming (mostly makes sense with p because
|
2010 |
o l can be used to turn off stemming (mostly makes sense with p because
|
2059 |
stemming is off by default for phrases).
|
2011 |
stemming is off by default for phrases).
|
2060 |
|
2012 |
|
2061 |
* o can be used to specify a "slack" for phrase and proximity searches:
|
2013 |
o o can be used to specify a "slack" for phrase and proximity searches:
|
2062 |
the number of additional terms that may be found between the specified
|
2014 |
the number of additional terms that may be found between the specified
|
2063 |
ones. If o is followed by an integer number, this is the slack, else
|
2015 |
ones. If o is followed by an integer number, this is the slack, else
|
2064 |
the default is 10.
|
2016 |
the default is 10.
|
2065 |
|
2017 |
|
2066 |
* p can be used to turn the default phrase search into a proximity one
|
2018 |
o p can be used to turn the default phrase search into a proximity one
|
2067 |
(unordered). Example:"order any in"p
|
2019 |
(unordered). Example:"order any in"p
|
2068 |
|
2020 |
|
2069 |
* C will turn on case sensitivity (if the index supports it).
|
2021 |
o C will turn on case sensitivity (if the index supports it).
|
2070 |
|
2022 |
|
2071 |
* D will turn on diacritics sensitivity (if the index supports it).
|
2023 |
o D will turn on diacritics sensitivity (if the index supports it).
|
2072 |
|
2024 |
|
2073 |
* A weight can be specified for a query element by specifying a decimal
|
2025 |
o A weight can be specified for a query element by specifying a decimal
|
2074 |
value at the start of the modifiers. Example: "Important"2.5.
|
2026 |
value at the start of the modifiers. Example: "Important"2.5.
|
2075 |
|
|
|
2076 |
----------------------------------------------------------------------
|
|
|
2077 |
|
2027 |
|
2078 |
3.5. Search case and diacritics sensitivity
|
2028 |
3.5. Search case and diacritics sensitivity
|
2079 |
|
2029 |
|
2080 |
For Recoll versions 1.18 and later, and when working with a raw index (not
|
2030 |
For Recoll versions 1.18 and later, and when working with a raw index (not
|
2081 |
the default), searches can be made sensitive to character case and
|
2031 |
the default), searches can be made sensitive to character case and
|
|
... |
|
... |
2123 |
will search for the term resume exactly (resume will not be a match).
|
2073 |
will search for the term resume exactly (resume will not be a match).
|
2124 |
|
2074 |
|
2125 |
When either case or diacritics sensitivity is activated, stem expansion is
|
2075 |
When either case or diacritics sensitivity is activated, stem expansion is
|
2126 |
turned off. Having both does not make much sense.
|
2076 |
turned off. Having both does not make much sense.
|
2127 |
|
2077 |
|
2128 |
----------------------------------------------------------------------
|
|
|
2129 |
|
|
|
2130 |
3.6. Anchored searches and wildcards
|
2078 |
3.6. Anchored searches and wildcards
|
2131 |
|
2079 |
|
2132 |
Some special characters are interpreted by Recoll in search strings to
|
2080 |
Some special characters are interpreted by Recoll in search strings to
|
2133 |
expand or specialize the search. Wildcards expand a root term in
|
2081 |
expand or specialize the search. Wildcards expand a root term in
|
2134 |
controlled ways. Anchor characters can restrict a search to succeed only
|
2082 |
controlled ways. Anchor characters can restrict a search to succeed only
|
2135 |
if the match is found at or near the beginning of the document or one of
|
2083 |
if the match is found at or near the beginning of the document or one of
|
2136 |
its fields.
|
2084 |
its fields.
|
2137 |
|
2085 |
|
2138 |
----------------------------------------------------------------------
|
|
|
2139 |
|
|
|
2140 |
3.6.1. More about wildcards
|
2086 |
3.6.1. More about wildcards
|
2141 |
|
2087 |
|
2142 |
All words entered in Recoll search fields will be processed for wildcard
|
2088 |
All words entered in Recoll search fields will be processed for wildcard
|
2143 |
expansion before the request is finally executed.
|
2089 |
expansion before the request is finally executed.
|
2144 |
|
2090 |
|
2145 |
The wildcard characters are:
|
2091 |
The wildcard characters are:
|
2146 |
|
2092 |
|
2147 |
* * which matches 0 or more characters.
|
2093 |
o * which matches 0 or more characters.
|
2148 |
|
2094 |
|
2149 |
* ? which matches a single character.
|
2095 |
o ? which matches a single character.
|
2150 |
|
2096 |
|
2151 |
* [] which allow defining sets of characters to be matched (ex: [abc]
|
2097 |
o [] which allow defining sets of characters to be matched (ex: [abc]
|
2152 |
matches a single character which may be 'a' or 'b' or 'c', [0-9]
|
2098 |
matches a single character which may be 'a' or 'b' or 'c', [0-9]
|
2153 |
matches any number.
|
2099 |
matches any number.
|
2154 |
|
2100 |
|
2155 |
You should be aware of a few things before using wildcards.
|
2101 |
You should be aware of a few things before using wildcards.
|
2156 |
|
2102 |
|
2157 |
* Using a wildcard character at the beginning of a word can make for a
|
2103 |
o Using a wildcard character at the beginning of a word can make for a
|
2158 |
slow search because Recoll will have to scan the whole index term list
|
2104 |
slow search because Recoll will have to scan the whole index term list
|
2159 |
to find the matches.
|
2105 |
to find the matches.
|
2160 |
|
2106 |
|
2161 |
* When working with a raw index (preserving character case and
|
2107 |
o When working with a raw index (preserving character case and
|
2162 |
diacritics), the literal part of a wildcard expression will be matched
|
2108 |
diacritics), the literal part of a wildcard expression will be matched
|
2163 |
exactly for case and diacritics.
|
2109 |
exactly for case and diacritics.
|
2164 |
|
2110 |
|
2165 |
* Using a * at the end of a word can produce more matches than you would
|
2111 |
o Using a * at the end of a word can produce more matches than you would
|
2166 |
think, and strange search results. You can use the term explorer tool
|
2112 |
think, and strange search results. You can use the term explorer tool
|
2167 |
to check what completions exist for a given term. You can also see
|
2113 |
to check what completions exist for a given term. You can also see
|
2168 |
exactly what search was performed by clicking on the link at the top
|
2114 |
exactly what search was performed by clicking on the link at the top
|
2169 |
of the result list. In general, for natural language terms, stem
|
2115 |
of the result list. In general, for natural language terms, stem
|
2170 |
expansion will produce better results than an ending * (stem expansion
|
2116 |
expansion will produce better results than an ending * (stem expansion
|
2171 |
is turned off when any wildcard character appears in the term).
|
2117 |
is turned off when any wildcard character appears in the term).
|
2172 |
|
2118 |
|
2173 |
----------------------------------------------------------------------
|
|
|
2174 |
|
|
|
2175 |
3.6.2. Anchored searches
|
2119 |
3.6.2. Anchored searches
|
2176 |
|
2120 |
|
2177 |
Two characters are used to specify that a search hit should occur at the
|
2121 |
Two characters are used to specify that a search hit should occur at the
|
2178 |
beginning or at the end of the text. ^ at the beginning of a term or
|
2122 |
beginning or at the end of the text. ^ at the beginning of a term or
|
2179 |
phrase constrains the search to happen at the start, $ at the end force it
|
2123 |
phrase constrains the search to happen at the start, $ at the end force it
|
|
... |
|
... |
2199 |
structured documents like scientific articles, in case explicit metadata
|
2143 |
structured documents like scientific articles, in case explicit metadata
|
2200 |
has not been supplied (a most frequent case), for example for looking for
|
2144 |
has not been supplied (a most frequent case), for example for looking for
|
2201 |
matches inside the abstract or the list of authors (which occur at the top
|
2145 |
matches inside the abstract or the list of authors (which occur at the top
|
2202 |
of the document).
|
2146 |
of the document).
|
2203 |
|
2147 |
|
2204 |
----------------------------------------------------------------------
|
|
|
2205 |
|
|
|
2206 |
3.7. Desktop integration
|
2148 |
3.7. Desktop integration
|
2207 |
|
2149 |
|
2208 |
Being independant of the desktop type has its drawbacks: Recoll desktop
|
2150 |
Being independant of the desktop type has its drawbacks: Recoll desktop
|
2209 |
integration is minimal. However there are a few tools available:
|
2151 |
integration is minimal. However there are a few tools available:
|
2210 |
|
2152 |
|
2211 |
* The KDE KIO Slave was described in a previous section.
|
2153 |
o The KDE KIO Slave was described in a previous section.
|
2212 |
|
2154 |
|
2213 |
* If you use a recent version of Ubuntu Linux, you may find the Ubuntu
|
2155 |
o If you use a recent version of Ubuntu Linux, you may find the Ubuntu
|
2214 |
Unity Lens module useful.
|
2156 |
Unity Lens module useful.
|
2215 |
|
2157 |
|
2216 |
* There is also an independantly developed Krunner plugin.
|
2158 |
o There is also an independantly developed Krunner plugin.
|
2217 |
|
2159 |
|
2218 |
Here follow a few other things that may help.
|
2160 |
Here follow a few other things that may help.
|
2219 |
|
|
|
2220 |
----------------------------------------------------------------------
|
|
|
2221 |
|
2161 |
|
2222 |
3.7.1. Hotkeying recoll
|
2162 |
3.7.1. Hotkeying recoll
|
2223 |
|
2163 |
|
2224 |
It is surprisingly convenient to be able to show or hide the Recoll GUI
|
2164 |
It is surprisingly convenient to be able to show or hide the Recoll GUI
|
2225 |
with a single keystroke. Recoll comes with a small Python script, based on
|
2165 |
with a single keystroke. Recoll comes with a small Python script, based on
|
2226 |
the libwnck window manager interface library, which will allow you to do
|
2166 |
the libwnck window manager interface library, which will allow you to do
|
2227 |
just this. The detailed instructions are on this wiki page.
|
2167 |
just this. The detailed instructions are on this wiki page.
|
2228 |
|
|
|
2229 |
----------------------------------------------------------------------
|
|
|
2230 |
|
2168 |
|
2231 |
3.7.2. The KDE Kicker Recoll applet
|
2169 |
3.7.2. The KDE Kicker Recoll applet
|
2232 |
|
2170 |
|
2233 |
This is probably obsolete now. Anyway:
|
2171 |
This is probably obsolete now. Anyway:
|
2234 |
|
2172 |
|
|
... |
|
... |
2249 |
query (in query language form), and an icon which can be used to restrict
|
2187 |
query (in query language form), and an icon which can be used to restrict
|
2250 |
the search to certain types of files. It is quite primitive, and launches
|
2188 |
the search to certain types of files. It is quite primitive, and launches
|
2251 |
a new recoll GUI instance every time (even if it is already running). You
|
2189 |
a new recoll GUI instance every time (even if it is already running). You
|
2252 |
may find it useful anyway.
|
2190 |
may find it useful anyway.
|
2253 |
|
2191 |
|
2254 |
----------------------------------------------------------------------
|
2192 |
Chapter 4. Programming interface
|
2255 |
|
|
|
2256 |
Chapter 4. Programming interface
|
|
|
2257 |
|
2193 |
|
2258 |
Recoll has an Application Programming Interface, usable both for indexing
|
2194 |
Recoll has an Application Programming Interface, usable both for indexing
|
2259 |
and searching, currently accessible from the Python language.
|
2195 |
and searching, currently accessible from the Python language.
|
2260 |
|
2196 |
|
2261 |
Another less radical way to extend the application is to write filters for
|
2197 |
Another less radical way to extend the application is to write filters for
|
2262 |
new types of documents.
|
2198 |
new types of documents.
|
2263 |
|
2199 |
|
2264 |
The processing of metadata attributes for documents (fields) is highly
|
2200 |
The processing of metadata attributes for documents (fields) is highly
|
2265 |
configurable.
|
2201 |
configurable.
|
2266 |
|
2202 |
|
2267 |
----------------------------------------------------------------------
|
|
|
2268 |
|
|
|
2269 |
4.1. Writing a document filter
|
2203 |
4.1. Writing a document filter
|
2270 |
|
2204 |
|
2271 |
Recoll filters are executable programs which translate from a specific
|
2205 |
Recoll filters cooperate to translate from the multitude of input document
|
2272 |
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
2206 |
formats, simple ones as opendocument, acrobat), or compound ones such as
|
2273 |
format, which may be text/plain or text/html.
|
2207 |
Zip or Email, into the final Recoll indexing input format, which may be
|
|
|
2208 |
text/plain or text/html. Most filters are executable programs or scripts.
|
|
|
2209 |
A few filters are coded in C++ and live inside recollindex. This latter
|
|
|
2210 |
kind will not be described here.
|
2274 |
|
2211 |
|
2275 |
As of Recoll 1.13, there are two kinds of filters:
|
2212 |
There are currently (1.18 and since 1.13) two kinds of external executable
|
|
|
2213 |
filters:
|
2276 |
|
2214 |
|
2277 |
* Simple filters (the old ones) run once and exit. They can be bare
|
2215 |
o Simple filters (exec filters) run once and exit. They can be bare
|
2278 |
programs like antiword, or shell-scripts using other programs. They
|
2216 |
programs like antiword, or scripts using other programs. They are very
|
2279 |
are very simple to write, because they just need to output the
|
2217 |
simple to write, because they just need to print the converted
|
2280 |
converted to the standard output.
|
2218 |
document to the standard output. Their output can be text/plain or
|
|
|
2219 |
text/html.
|
2281 |
|
2220 |
|
2282 |
* Multiple filters, new in 1.13, run as long as their master process
|
2221 |
o Multiple filters (execm filters), run as long as their master process
|
2283 |
(ie: recollindex) is active. They can process multiple files (sparing
|
2222 |
(recollindex) is active. They can process multiple files (sparing the
|
2284 |
the process startup time which can be very significant), or multiple
|
2223 |
process startup time which can be very significant), or multiple
|
2285 |
documents per file (ie: for zip or chm files). They communicate with
|
2224 |
documents per file (e.g.: for zip or chm files). They communicate with
|
2286 |
the indexer through a simple protocol, but are nevertheless a bit more
|
2225 |
the indexer through a simple protocol, but are nevertheless a bit more
|
2287 |
complicated than the older kind. Most of these new filters are written
|
2226 |
complicated than the older kind. Most of new filters are written in
|
2288 |
in Python, using a common module to handle the protocol.
|
2227 |
Python, using a common module to handle the protocol. There is an
|
|
|
2228 |
exception, rclimg which is written in Perl. The subdocuments output by
|
|
|
2229 |
these filters can be directly indexable (text or HTML), or they can be
|
|
|
2230 |
other simple or compound documents that will need to be processed by
|
|
|
2231 |
another filter.
|
2289 |
|
2232 |
|
2290 |
The following will just describe the simple filters. If you can program
|
2233 |
In both cases, filters deal with regular file system files, and can
|
2291 |
and want to write one of the other kind, it shouldn't be too difficult to
|
2234 |
process either a single document, or a linear list of documents in each
|
2292 |
make sense of one of the existing modules. For example, look at rclzip
|
2235 |
file. Recoll is responsible for performing up to date checks, deal with
|
2293 |
which uses Zip file paths as internal identifiers (ipath), and rclinfo,
|
2236 |
more complex embedding and other upper level issues.
|
2294 |
which uses an integer index.
|
|
|
2295 |
|
2237 |
|
2296 |
----------------------------------------------------------------------
|
2238 |
In the extreme case of a simple filter returning a document in text/plain
|
|
|
2239 |
format, no metadata can be transferred from the filter to the indexer.
|
|
|
2240 |
Generic metadata, like document size or modification date, will be
|
|
|
2241 |
gathered and stored by the indexer.
|
|
|
2242 |
|
|
|
2243 |
Filters that produce text/html format can return an arbitrary amount of
|
|
|
2244 |
metadata inside HTML meta tags. These will be processed according to the
|
|
|
2245 |
directives found in the fields configuration file.
|
|
|
2246 |
|
|
|
2247 |
The filters that can handle multiple documents per file return a single
|
|
|
2248 |
piece of data to identify each document inside the file. This piece of
|
|
|
2249 |
data, called an ipath element will be sent back by Recoll to extract the
|
|
|
2250 |
document at query time, for previewing, or for creating a temporary file
|
|
|
2251 |
to be opened by a viewer.
|
|
|
2252 |
|
|
|
2253 |
The following section describes the simple filters, and the next one gives
|
|
|
2254 |
a few explanations about the execm ones. You could conceivably write a
|
|
|
2255 |
simple filter with only the elements in the manual. This will not be the
|
|
|
2256 |
case for the other ones, for which you will have to look at the code.
|
2297 |
|
2257 |
|
2298 |
4.1.1. Simple filters
|
2258 |
4.1.1. Simple filters
|
2299 |
|
2259 |
|
2300 |
Recoll simple filters are usually shell-scripts, but this is in no way
|
2260 |
Recoll simple filters are usually shell-scripts, but this is in no way
|
2301 |
necessary. Extracting the text from the native format is the difficult
|
2261 |
necessary. Extracting the text from the native format is the difficult
|
|
... |
|
... |
2325 |
You should look at one of the simple filters, for example rclps for a
|
2285 |
You should look at one of the simple filters, for example rclps for a
|
2326 |
starting point.
|
2286 |
starting point.
|
2327 |
|
2287 |
|
2328 |
Don't forget to make your filter executable before testing !
|
2288 |
Don't forget to make your filter executable before testing !
|
2329 |
|
2289 |
|
2330 |
----------------------------------------------------------------------
|
2290 |
4.1.2. "Multiple" filters
|
2331 |
|
2291 |
|
|
|
2292 |
If you can program and want to write an execm filter, it should not be too
|
|
|
2293 |
difficult to make sense of one of the existing modules. For example, look
|
|
|
2294 |
at rclzip which uses Zip file paths as identifiers (ipath), and rclics,
|
|
|
2295 |
which uses an integer index. Also have a look at the comments inside the
|
|
|
2296 |
internfile/mh_execm.h file and possibly at the corresponding module.
|
|
|
2297 |
|
|
|
2298 |
execm filters sometimes need to make a choice for the nature of the ipath
|
|
|
2299 |
elements that they use in communication with the indexer. Here are a few
|
|
|
2300 |
guidelines:
|
|
|
2301 |
|
|
|
2302 |
o Use ASCII or UTF-8 (if the identifier is an integer print it, for
|
|
|
2303 |
example, like printf %d would do).
|
|
|
2304 |
|
|
|
2305 |
o If at all possible, the data should make some kind of sense when
|
|
|
2306 |
printed to a log file to help with debugging.
|
|
|
2307 |
|
|
|
2308 |
o Recoll uses a colon (:) as a separator to store a complex path
|
|
|
2309 |
internally (for deeper embedding). Colons inside the ipath elements
|
|
|
2310 |
output by a filter will be escaped, but would be a bad choice as a
|
|
|
2311 |
filter-specific separator (mostly, again, for debugging issues).
|
|
|
2312 |
|
|
|
2313 |
In any case, the main goal is that it should be easy for the filter to
|
|
|
2314 |
extract the target document, given the file name and the ipath element.
|
|
|
2315 |
|
|
|
2316 |
execm filters will also produce a document with a null ipath element.
|
|
|
2317 |
Depending on the type of document, this may have some associated data
|
|
|
2318 |
(e.g. the body of an email message), or none (typical for an archive
|
|
|
2319 |
file). If it is empty, this document will be useful anyway for some
|
|
|
2320 |
operations, as the parent of the actual data documents.
|
|
|
2321 |
|
2332 |
4.1.2. Telling Recoll about the filter
|
2322 |
4.1.3. Telling Recoll about the filter
|
2333 |
|
2323 |
|
2334 |
There are two elements that link a file to the filter which should process
|
2324 |
There are two elements that link a file to the filter which should process
|
2335 |
it: the association of file to mime type and the association of a mime
|
2325 |
it: the association of file to mime type and the association of a mime
|
2336 |
type with a filter.
|
2326 |
type with a filter.
|
2337 |
|
2327 |
|
|
... |
|
... |
2358 |
|
2348 |
|
2359 |
application/x-chm = execm rclchm
|
2349 |
application/x-chm = execm rclchm
|
2360 |
|
2350 |
|
2361 |
The fragment specifies that:
|
2351 |
The fragment specifies that:
|
2362 |
|
2352 |
|
2363 |
* application/msword files are processed by executing the antiword
|
2353 |
o application/msword files are processed by executing the antiword
|
2364 |
program, which outputs text/plain encoded in utf-8.
|
2354 |
program, which outputs text/plain encoded in utf-8.
|
2365 |
|
2355 |
|
2366 |
* application/ogg files are processed by the rclogg script, with default
|
2356 |
o application/ogg files are processed by the rclogg script, with default
|
2367 |
output type (text/html, with encoding specified in the header, or
|
2357 |
output type (text/html, with encoding specified in the header, or
|
2368 |
utf-8 by default).
|
2358 |
utf-8 by default).
|
2369 |
|
2359 |
|
2370 |
* text/rtf is processed by unrtf, which outputs text/html. The
|
2360 |
o text/rtf is processed by unrtf, which outputs text/html. The
|
2371 |
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
2361 |
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
2372 |
and not output by unrtf in the HTML header section.
|
2362 |
and not output by unrtf in the HTML header section.
|
2373 |
|
2363 |
|
2374 |
* application/x-chm is processed by a persistant filter. This is
|
2364 |
o application/x-chm is processed by a persistant filter. This is
|
2375 |
determined by the execm keyword.
|
2365 |
determined by the execm keyword.
|
2376 |
|
2366 |
|
2377 |
----------------------------------------------------------------------
|
|
|
2378 |
|
|
|
2379 |
4.1.3. Filter HTML output
|
2367 |
4.1.4. Filter HTML output
|
2380 |
|
2368 |
|
2381 |
The output HTML could be very minimal like the following example:
|
2369 |
The output HTML could be very minimal like the following example:
|
2382 |
|
2370 |
|
2383 |
<html><head>
|
2371 |
<html><head>
|
2384 |
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
2372 |
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
|
... |
|
... |
2405 |
<meta name="somefield" content="Some textual data" />
|
2393 |
<meta name="somefield" content="Some textual data" />
|
2406 |
|
2394 |
|
2407 |
See the following section for details about configuring how field data is
|
2395 |
See the following section for details about configuring how field data is
|
2408 |
processed by the indexer.
|
2396 |
processed by the indexer.
|
2409 |
|
2397 |
|
2410 |
----------------------------------------------------------------------
|
|
|
2411 |
|
|
|
2412 |
4.1.4. Page numbers
|
2398 |
4.1.5. Page numbers
|
2413 |
|
2399 |
|
2414 |
The indexer will interpret ^L characters in the filter output as
|
2400 |
The indexer will interpret ^L characters in the filter output as
|
2415 |
indicating page breaks, and will record them. At query time, this allows
|
2401 |
indicating page breaks, and will record them. At query time, this allows
|
2416 |
starting a viewer on the right page for a hit or a snippet. Currently,
|
2402 |
starting a viewer on the right page for a hit or a snippet. Currently,
|
2417 |
only the PDF, Postscript and DVI filters generate page breaks.
|
2403 |
only the PDF, Postscript and DVI filters generate page breaks.
|
2418 |
|
2404 |
|
2419 |
----------------------------------------------------------------------
|
|
|
2420 |
|
|
|
2421 |
4.2. Field data processing
|
2405 |
4.2. Field data processing
|
2422 |
|
2406 |
|
2423 |
Fields are named pieces of information in or about documents, like title,
|
2407 |
Fields are named pieces of information in or about documents, like title,
|
2424 |
author, abstract.
|
2408 |
author, abstract.
|
2425 |
|
2409 |
|
|
... |
|
... |
2433 |
Recoll defines a number of default fields. Additional ones can be output
|
2417 |
Recoll defines a number of default fields. Additional ones can be output
|
2434 |
by filters, and described in the fields configuration file.
|
2418 |
by filters, and described in the fields configuration file.
|
2435 |
|
2419 |
|
2436 |
Fields can be:
|
2420 |
Fields can be:
|
2437 |
|
2421 |
|
2438 |
* indexed, meaning that their terms are separately stored in inverted
|
2422 |
o indexed, meaning that their terms are separately stored in inverted
|
2439 |
lists (with a specific prefix), and that a field-specific search is
|
2423 |
lists (with a specific prefix), and that a field-specific search is
|
2440 |
possible.
|
2424 |
possible.
|
2441 |
|
2425 |
|
2442 |
* stored, meaning that their value is recorded in the index data record
|
2426 |
o stored, meaning that their value is recorded in the index data record
|
2443 |
for the document, and can be returned and displayed with search
|
2427 |
for the document, and can be returned and displayed with search
|
2444 |
results.
|
2428 |
results.
|
2445 |
|
2429 |
|
2446 |
A field can be either or both indexed and stored. This and other aspects
|
2430 |
A field can be either or both indexed and stored. This and other aspects
|
2447 |
of fields handling is defined inside the fields configuration file.
|
2431 |
of fields handling is defined inside the fields configuration file.
|
2448 |
|
2432 |
|
2449 |
The sequence of events for field processing is as follows:
|
2433 |
The sequence of events for field processing is as follows:
|
2450 |
|
2434 |
|
2451 |
* During indexing, recollindex scans all meta fields in HTML documents
|
2435 |
o During indexing, recollindex scans all meta fields in HTML documents
|
2452 |
(most document types are transformed into HTML at some point). It
|
2436 |
(most document types are transformed into HTML at some point). It
|
2453 |
compares the name for each element to the configuration defining what
|
2437 |
compares the name for each element to the configuration defining what
|
2454 |
should be done with fields (the fields file)
|
2438 |
should be done with fields (the fields file)
|
2455 |
|
2439 |
|
2456 |
* If the name for the meta element matches one for a field that should
|
2440 |
o If the name for the meta element matches one for a field that should
|
2457 |
be indexed, the contents are processed and the terms are entered into
|
2441 |
be indexed, the contents are processed and the terms are entered into
|
2458 |
the index with the prefix defined in the fields file.
|
2442 |
the index with the prefix defined in the fields file.
|
2459 |
|
2443 |
|
2460 |
* If the name for the meta element matches one for a field that should
|
2444 |
o If the name for the meta element matches one for a field that should
|
2461 |
be stored, the content of the element is stored with the document data
|
2445 |
be stored, the content of the element is stored with the document data
|
2462 |
record, from which it can be extracted and displayed at query time.
|
2446 |
record, from which it can be extracted and displayed at query time.
|
2463 |
|
2447 |
|
2464 |
* At query time, if a field search is performed, the index prefix is
|
2448 |
o At query time, if a field search is performed, the index prefix is
|
2465 |
computed and the match is only performed against appropriately
|
2449 |
computed and the match is only performed against appropriately
|
2466 |
prefixed terms in the index.
|
2450 |
prefixed terms in the index.
|
2467 |
|
2451 |
|
2468 |
* At query time, the field can be displayed inside the result list by
|
2452 |
o At query time, the field can be displayed inside the result list by
|
2469 |
using the appropriate directive in the definition of the result list
|
2453 |
using the appropriate directive in the definition of the result list
|
2470 |
paragraph format. All fields are displayed on the fields screen of the
|
2454 |
paragraph format. All fields are displayed on the fields screen of the
|
2471 |
preview window (which you can reach through the right-click menu).
|
2455 |
preview window (which you can reach through the right-click menu).
|
2472 |
This is independant of the fact that the search which produced the
|
2456 |
This is independant of the fact that the search which produced the
|
2473 |
results used the field or not.
|
2457 |
results used the field or not.
|
|
... |
|
... |
2476 |
comments inside the file.
|
2460 |
comments inside the file.
|
2477 |
|
2461 |
|
2478 |
You can also have a look at the example on the Wiki, detailing how one
|
2462 |
You can also have a look at the example on the Wiki, detailing how one
|
2479 |
could add a page count field to pdf documents for displaying inside result
|
2463 |
could add a page count field to pdf documents for displaying inside result
|
2480 |
lists.
|
2464 |
lists.
|
2481 |
|
|
|
2482 |
----------------------------------------------------------------------
|
|
|
2483 |
|
2465 |
|
2484 |
4.3. API
|
2466 |
4.3. API
|
2485 |
|
2467 |
|
2486 |
4.3.1. Interface elements
|
2468 |
4.3.1. Interface elements
|
2487 |
|
2469 |
|
|
... |
|
... |
2520 |
is not used at all). The reason is that the main document indexer purge
|
2502 |
is not used at all). The reason is that the main document indexer purge
|
2521 |
pass would remove all the other indexer's documents, as they were not seen
|
2503 |
pass would remove all the other indexer's documents, as they were not seen
|
2522 |
during indexing. The main indexer documents would also probably be a
|
2504 |
during indexing. The main indexer documents would also probably be a
|
2523 |
problem for the external indexer purge operation.
|
2505 |
problem for the external indexer purge operation.
|
2524 |
|
2506 |
|
2525 |
----------------------------------------------------------------------
|
|
|
2526 |
|
|
|
2527 |
4.3.2. Python interface
|
2507 |
4.3.2. Python interface
|
2528 |
|
2508 |
|
2529 |
4.3.2.1. Introduction
|
2509 |
4.3.2.1. Introduction
|
2530 |
|
2510 |
|
2531 |
Recoll versions after 1.11 define a Python programming interface, both for
|
2511 |
Recoll versions after 1.11 define a Python programming interface, both for
|
|
... |
|
... |
2549 |
can then use to build and install the module:
|
2529 |
can then use to build and install the module:
|
2550 |
|
2530 |
|
2551 |
cd recoll-xxx/python/recoll
|
2531 |
cd recoll-xxx/python/recoll
|
2552 |
python setup.py build
|
2532 |
python setup.py build
|
2553 |
python setup.py install
|
2533 |
python setup.py install
|
2554 |
|
|
|
2555 |
----------------------------------------------------------------------
|
|
|
2556 |
|
2534 |
|
2557 |
4.3.2.2. Interface manual
|
2535 |
4.3.2.2. Interface manual
|
2558 |
|
2536 |
|
2559 |
NAME
|
2537 |
NAME
|
2560 |
recoll - This is an interface to the Recoll full text indexer.
|
2538 |
recoll - This is an interface to the Recoll full text indexer.
|
|
... |
|
... |
2672 |
|
|
2650 |
|
|
2673 |
| Methods defined here:
|
2651 |
| Methods defined here:
|
2674 |
|
|
2652 |
|
|
2675 |
|
|
2653 |
|
|
2676 |
| execute(...)
|
2654 |
| execute(...)
|
2677 |
| execute(query_string, stemming=1|0)
|
2655 |
| execute(query_string, stemming=1|0, stemlang="stemming language")
|
2678 |
|
|
2656 |
|
|
2679 |
| Starts a search for query_string, a Recoll search language string
|
2657 |
| Starts a search for query_string, a Recoll search language string
|
2680 |
| (mostly Xesam-compatible).
|
2658 |
| (mostly Xesam-compatible).
|
2681 |
| The query can be a simple list of terms (and'ed by default), or more
|
2659 |
| The query can be a simple list of terms (and'ed by default), or more
|
2682 |
| complicated with field specs etc. See the Recoll manual.
|
2660 |
| complicated with field specs etc. See the Recoll manual.
|
|
... |
|
... |
2738 |
confdir specifies a Recoll configuration directory
|
2716 |
confdir specifies a Recoll configuration directory
|
2739 |
(the default is built like for any Recoll program).
|
2717 |
(the default is built like for any Recoll program).
|
2740 |
extra_dbs is a list of external databases (xapian directories)
|
2718 |
extra_dbs is a list of external databases (xapian directories)
|
2741 |
writable decides if we can index new data through this connection
|
2719 |
writable decides if we can index new data through this connection
|
2742 |
|
2720 |
|
2743 |
----------------------------------------------------------------------
|
|
|
2744 |
|
|
|
2745 |
4.3.2.3. Example code
|
2721 |
4.3.2.3. Example code
|
2746 |
|
2722 |
|
2747 |
The following sample would query the index with a user language string.
|
2723 |
The following sample would query the index with a user language string.
|
2748 |
See the python/samples directory inside the Recoll source for other
|
2724 |
See the python/samples directory inside the Recoll source for other
|
2749 |
examples.
|
2725 |
examples.
|
2750 |
|
2726 |
|
2751 |
#!/usr/bin/env python
|
2727 |
#!/usr/bin/env python
|
|
|
2728 |
|
2752 |
import recoll
|
2729 |
import recoll
|
2753 |
|
2730 |
|
2754 |
db = recoll.connect()
|
2731 |
db = recoll.connect()
|
2755 |
db.setAbstractParams(maxchars=80, contextwords=2)
|
2732 |
db.setAbstractParams(maxchars=80, contextwords=2)
|
2756 |
|
2733 |
|
|
... |
|
... |
2767 |
abs = db.makeDocAbstract(doc, query).encode('utf-8')
|
2744 |
abs = db.makeDocAbstract(doc, query).encode('utf-8')
|
2768 |
print abs
|
2745 |
print abs
|
2769 |
print
|
2746 |
print
|
2770 |
|
2747 |
|
2771 |
|
2748 |
|
2772 |
----------------------------------------------------------------------
|
|
|
2773 |
|
2749 |
|
|
|
2750 |
|
2774 |
Chapter 5. Installation and configuration
|
2751 |
Chapter 5. Installation and configuration
|
2775 |
|
2752 |
|
2776 |
5.1. Installing a binary copy
|
2753 |
5.1. Installing a binary copy
|
2777 |
|
2754 |
|
2778 |
There are three types of binary Recoll installations:
|
2755 |
There are three types of binary Recoll installations:
|
2779 |
|
2756 |
|
2780 |
* Through your system normal software distribution framework (ie,
|
2757 |
o Through your system normal software distribution framework (ie,
|
2781 |
Debian/Ubuntu apt, FreeBSD ports, etc.).
|
2758 |
Debian/Ubuntu apt, FreeBSD ports, etc.).
|
2782 |
|
2759 |
|
2783 |
* From a package downloaded from the Recoll web site.
|
2760 |
o From a package downloaded from the Recoll web site.
|
2784 |
|
2761 |
|
2785 |
* From a prebuilt tree downloaded from the Recoll web site.
|
2762 |
o From a prebuilt tree downloaded from the Recoll web site.
|
2786 |
|
2763 |
|
2787 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
2764 |
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
2788 |
will be automatically satisfied, you should not have to worry about them.
|
2765 |
will be automatically satisfied, you should not have to worry about them.
|
2789 |
|
2766 |
|
2790 |
You will only have to check or install supporting applications for the
|
2767 |
You will only have to check or install supporting applications for the
|
|
... |
|
... |
2793 |
|
2770 |
|
2794 |
You should also maybe have a look at the configuration section (but this
|
2771 |
You should also maybe have a look at the configuration section (but this
|
2795 |
may not be necessary for a quick test with default parameters). Most
|
2772 |
may not be necessary for a quick test with default parameters). Most
|
2796 |
parameters can be more conveniently set from the GUI interface.
|
2773 |
parameters can be more conveniently set from the GUI interface.
|
2797 |
|
2774 |
|
2798 |
----------------------------------------------------------------------
|
|
|
2799 |
|
|
|
2800 |
5.1.1. Installing through a package system
|
2775 |
5.1.1. Installing through a package system
|
2801 |
|
2776 |
|
2802 |
If you use a BSD-type port system or a prebuilt package (DEB, RPM,
|
2777 |
If you use a BSD-type port system or a prebuilt package (DEB, RPM,
|
2803 |
manually or through the system software configuration utility), just
|
2778 |
manually or through the system software configuration utility), just
|
2804 |
follow the usual procedure for your system.
|
2779 |
follow the usual procedure for your system.
|
2805 |
|
2780 |
|
2806 |
----------------------------------------------------------------------
|
|
|
2807 |
|
|
|
2808 |
5.1.2. Installing a prebuilt Recoll
|
2781 |
5.1.2. Installing a prebuilt Recoll
|
2809 |
|
2782 |
|
2810 |
The unpackaged binary versions on the Recoll web site are just compressed
|
2783 |
The unpackaged binary versions on the Recoll web site are just compressed
|
2811 |
tar files of a build tree, where only the useful parts were kept
|
2784 |
tar files of a build tree, where only the useful parts were kept
|
2812 |
(executables and sample configuration).
|
2785 |
(executables and sample configuration).
|
|
... |
|
... |
2815 |
libiconv, to make installation easier (no dependencies).
|
2788 |
libiconv, to make installation easier (no dependencies).
|
2816 |
|
2789 |
|
2817 |
After extracting the tar file, you can proceed with installation as if you
|
2790 |
After extracting the tar file, you can proceed with installation as if you
|
2818 |
had built the package from source (that is, just type make install). The
|
2791 |
had built the package from source (that is, just type make install). The
|
2819 |
binary trees are built for installation to /usr/local.
|
2792 |
binary trees are built for installation to /usr/local.
|
2820 |
|
|
|
2821 |
----------------------------------------------------------------------
|
|
|
2822 |
|
2793 |
|
2823 |
5.2. Supporting packages
|
2794 |
5.2. Supporting packages
|
2824 |
|
2795 |
|
2825 |
Recoll uses external applications to index some file types. You need to
|
2796 |
Recoll uses external applications to index some file types. You need to
|
2826 |
install them for the file types that you wish to have indexed (these are
|
2797 |
install them for the file types that you wish to have indexed (these are
|
|
... |
|
... |
2850 |
by ad hoc filter code now use the xsltproc command, which usually comes
|
2821 |
by ad hoc filter code now use the xsltproc command, which usually comes
|
2851 |
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
2822 |
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
2852 |
|
2823 |
|
2853 |
Now for the list:
|
2824 |
Now for the list:
|
2854 |
|
2825 |
|
2855 |
* Openoffice files need unzip and xsltproc.
|
2826 |
o Openoffice files need unzip and xsltproc.
|
2856 |
|
2827 |
|
2857 |
* PDF files need pdftotext which is part of the Xpdf or Poppler
|
2828 |
o PDF files need pdftotext which is part of the Xpdf or Poppler
|
2858 |
packages.
|
2829 |
packages.
|
2859 |
|
2830 |
|
2860 |
* Postscript files need pstotext. The original version has an issue with
|
2831 |
o Postscript files need pstotext. The original version has an issue with
|
2861 |
shell character in file names, which is corrected in recent packages.
|
2832 |
shell character in file names, which is corrected in recent packages.
|
2862 |
See the the Recoll helper applications page for more detail.
|
2833 |
See the the Recoll helper applications page for more detail.
|
2863 |
|
2834 |
|
2864 |
* MS Word needs antiword. It is also useful to have wvWare installed as
|
2835 |
o MS Word needs antiword. It is also useful to have wvWare installed as
|
2865 |
it may be be used as a fallback for some files which antiword does not
|
2836 |
it may be be used as a fallback for some files which antiword does not
|
2866 |
handle.
|
2837 |
handle.
|
2867 |
|
2838 |
|
2868 |
* MS Excel and PowerPoint need catdoc.
|
2839 |
o MS Excel and PowerPoint need catdoc.
|
2869 |
|
2840 |
|
2870 |
* MS Open XML (docx) needs xsltproc.
|
2841 |
o MS Open XML (docx) needs xsltproc.
|
2871 |
|
2842 |
|
2872 |
* Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
|
2843 |
o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
|
2873 |
Ubuntu) package.
|
2844 |
Ubuntu) package.
|
2874 |
|
2845 |
|
2875 |
* RTF files need unrtf, which, in its standard version, has much trouble
|
2846 |
o RTF files need unrtf, which, in its standard version, has much trouble
|
2876 |
with non-western character sets. Check the Recoll helper applications
|
2847 |
with non-western character sets. Check the Recoll helper applications
|
2877 |
page.
|
2848 |
page.
|
2878 |
|
2849 |
|
2879 |
* TeX files need untex or detex. Check the Recoll helper applications
|
2850 |
o TeX files need untex or detex. Check the Recoll helper applications
|
2880 |
page for sources if it's not packaged for your distribution.
|
2851 |
page for sources if it's not packaged for your distribution.
|
2881 |
|
2852 |
|
2882 |
* dvi files need dvips.
|
2853 |
o dvi files need dvips.
|
2883 |
|
2854 |
|
2884 |
* djvu files need djvutxt and djvused from the DjVuLibre package.
|
2855 |
o djvu files need djvutxt and djvused from the DjVuLibre package.
|
2885 |
|
2856 |
|
2886 |
* Audio files: Recoll releases before 1.13 used the id3info command from
|
2857 |
o Audio files: Recoll releases before 1.13 used the id3info command from
|
2887 |
the id3lib package to extract mp3 tag information, metaflac (standard
|
2858 |
the id3lib package to extract mp3 tag information, metaflac (standard
|
2888 |
flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
|
2859 |
flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
|
2889 |
Releases 1.14 and later use a single Python filter based on mutagen
|
2860 |
Releases 1.14 and later use a single Python filter based on mutagen
|
2890 |
for all audio file types.
|
2861 |
for all audio file types.
|
2891 |
|
2862 |
|
2892 |
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
2863 |
o Pictures: Recoll uses the Exiftool Perl package to extract tag
|
2893 |
information. Most image file formats are supported. Note that there
|
2864 |
information. Most image file formats are supported. Note that there
|
2894 |
may not be much interest in indexing the technical tags (image size,
|
2865 |
may not be much interest in indexing the technical tags (image size,
|
2895 |
aperture, etc.). This is only of interest if you store personal tags
|
2866 |
aperture, etc.). This is only of interest if you store personal tags
|
2896 |
or textual descriptions inside the image files.
|
2867 |
or textual descriptions inside the image files.
|
2897 |
|
2868 |
|
2898 |
* chm: files in microsoft help format need Python and the pychm module
|
2869 |
o chm: files in microsoft help format need Python and the pychm module
|
2899 |
(which needs chmlib).
|
2870 |
(which needs chmlib).
|
2900 |
|
2871 |
|
2901 |
* ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
2872 |
o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
2902 |
module. icalendar is not needed for newer versions, which use internal
|
2873 |
module. icalendar is not needed for newer versions, which use internal
|
2903 |
code.
|
2874 |
code.
|
2904 |
|
2875 |
|
2905 |
* Zip archives need Python (and the standard zipfile module).
|
2876 |
o Zip archives need Python (and the standard zipfile module).
|
2906 |
|
2877 |
|
2907 |
* Rar archives need Python, the rarfile Python module and the unrar
|
2878 |
o Rar archives need Python, the rarfile Python module and the unrar
|
2908 |
utility.
|
2879 |
utility.
|
2909 |
|
2880 |
|
2910 |
* Midi karaoke files need Python and the Midi module
|
2881 |
o Midi karaoke files need Python and the Midi module
|
2911 |
|
2882 |
|
2912 |
* Konqueror webarchive format with Python (uses the Tarfile module).
|
2883 |
o Konqueror webarchive format with Python (uses the Tarfile module).
|
2913 |
|
2884 |
|
2914 |
* mimehtml web archive format (support based on the email filter, which
|
2885 |
o mimehtml web archive format (support based on the email filter, which
|
2915 |
introduces some mild weirdness, but still usable).
|
2886 |
introduces some mild weirdness, but still usable).
|
2916 |
|
2887 |
|
2917 |
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
2888 |
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
2918 |
is used to index Lyx files. Many filters need iconv and the standard sed
|
2889 |
is used to index Lyx files. Many filters need iconv and the standard sed
|
2919 |
and awk.
|
2890 |
and awk.
|
2920 |
|
2891 |
|
2921 |
----------------------------------------------------------------------
|
|
|
2922 |
|
|
|
2923 |
5.3. Building from source
|
2892 |
5.3. Building from source
|
2924 |
|
2893 |
|
2925 |
5.3.1. Prerequisites
|
2894 |
5.3.1. Prerequisites
|
2926 |
|
2895 |
|
2927 |
C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
2896 |
C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
2928 |
itself by strange messages about a missing iconv_open.
|
2897 |
itself by strange messages about a missing iconv_open.
|
2929 |
|
2898 |
|
2930 |
Development files for Xapian core.
|
2899 |
Development files for Xapian core.
|
2931 |
|
2900 |
|
|
|
2901 |
Important
|
|
|
2902 |
|
2932 |
Important: If you are building Xapian for an older CPU (before Pentium 4
|
2903 |
If you are building Xapian for an older CPU (before Pentium 4 or Athlon
|
2933 |
or Athlon 64), you need to add the --disable-sse flag to the configure
|
2904 |
64), you need to add the --disable-sse flag to the configure command. Else
|
2934 |
command. Else all Xapian application will crash with an illegal
|
2905 |
all Xapian application will crash with an illegal instruction error.
|
2935 |
instruction error.
|
|
|
2936 |
|
2906 |
|
2937 |
Development files for Qt .
|
2907 |
Development files for Qt .
|
2938 |
|
2908 |
|
2939 |
Development files for X11 and zlib.
|
2909 |
Development files for X11 and zlib.
|
2940 |
|
2910 |
|
|
... |
|
... |
2945 |
are using FreeBSD, there is a port).
|
2915 |
are using FreeBSD, there is a port).
|
2946 |
|
2916 |
|
2947 |
You may also need libiconv. Recoll currently uses version 1.9 (this should
|
2917 |
You may also need libiconv. Recoll currently uses version 1.9 (this should
|
2948 |
not be critical). On Linux systems, the iconv interface is part of libc
|
2918 |
not be critical). On Linux systems, the iconv interface is part of libc
|
2949 |
and you should not need to do anything special.
|
2919 |
and you should not need to do anything special.
|
2950 |
|
|
|
2951 |
----------------------------------------------------------------------
|
|
|
2952 |
|
2920 |
|
2953 |
5.3.2. Building
|
2921 |
5.3.2. Building
|
2954 |
|
2922 |
|
2955 |
Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
|
2923 |
Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
|
2956 |
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
2924 |
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
|
... |
|
... |
2958 |
very much welcome patches.
|
2926 |
very much welcome patches.
|
2959 |
|
2927 |
|
2960 |
Depending on the Qt 3 configuration on your system, you may have to set
|
2928 |
Depending on the Qt 3 configuration on your system, you may have to set
|
2961 |
the QTDIR and QMAKESPECS variables in your environment:
|
2929 |
the QTDIR and QMAKESPECS variables in your environment:
|
2962 |
|
2930 |
|
2963 |
* QTDIR should point to the directory above the one that holds the qt
|
2931 |
o QTDIR should point to the directory above the one that holds the qt
|
2964 |
include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
|
2932 |
include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
|
2965 |
be /usr/local/qt).
|
2933 |
be /usr/local/qt).
|
2966 |
|
2934 |
|
2967 |
* QMAKESPECS should be set to the name of one of the Qt mkspecs
|
2935 |
o QMAKESPECS should be set to the name of one of the Qt mkspecs
|
2968 |
sub-directories (ie: linux-g++).
|
2936 |
sub-directories (ie: linux-g++).
|
2969 |
|
2937 |
|
2970 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
2938 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
2971 |
is not needed because there is a default link in mkspecs/.
|
2939 |
is not needed because there is a default link in mkspecs/.
|
2972 |
|
2940 |
|
2973 |
Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration
|
2941 |
Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration
|
2974 |
details are entirely determined by qmake (which is quite often installed
|
2942 |
details are entirely determined by qmake (which is quite often installed
|
2975 |
as qmake-qt4).
|
2943 |
as qmake-qt4).
|
2976 |
|
2944 |
|
2977 |
Configure options:
|
2945 |
Configure options:
|
2978 |
|
2946 |
|
2979 |
* --without-aspell will disable the code for phonetic matching of search
|
2947 |
o --without-aspell will disable the code for phonetic matching of search
|
2980 |
terms.
|
2948 |
terms.
|
2981 |
|
2949 |
|
2982 |
* --with-fam or --with-inotify will enable the code for real time
|
2950 |
o --with-fam or --with-inotify will enable the code for real time
|
2983 |
indexing. Inotify support is enabled by default on recent Linux
|
2951 |
indexing. Inotify support is enabled by default on recent Linux
|
2984 |
systems.
|
2952 |
systems.
|
2985 |
|
2953 |
|
2986 |
* --disable-webkit is available from version 1.17 to implement the
|
2954 |
o --disable-webkit is available from version 1.17 to implement the
|
2987 |
result list with a Qt QTextBrowser instead of a WebKit widget if you
|
2955 |
result list with a Qt QTextBrowser instead of a WebKit widget if you
|
2988 |
do not or can't depend on the latter.
|
2956 |
do not or can't depend on the latter.
|
2989 |
|
2957 |
|
2990 |
* --enable-xattr will enable code to fetch data from file extended
|
2958 |
o --enable-xattr will enable code to fetch data from file extended
|
2991 |
attributes. This is only useful is some application stores data in
|
2959 |
attributes. This is only useful is some application stores data in
|
2992 |
there, and also needs some simple configuration (see comments in the
|
2960 |
there, and also needs some simple configuration (see comments in the
|
2993 |
fields configuration file).
|
2961 |
fields configuration file).
|
2994 |
|
2962 |
|
2995 |
* --enable-camelcase will enable splitting camelCase words. This is not
|
2963 |
o --enable-camelcase will enable splitting camelCase words. This is not
|
2996 |
enabled by default as it has the unfortunate side-effect of making
|
2964 |
enabled by default as it has the unfortunate side-effect of making
|
2997 |
some phrase searches quite confusing: ie, "MySQL manual" would be
|
2965 |
some phrase searches quite confusing: ie, "MySQL manual" would be
|
2998 |
matched by "MySQL manual" and "my sql manual" but not "mysql manual"
|
2966 |
matched by "MySQL manual" and "my sql manual" but not "mysql manual"
|
2999 |
(only inside phrase searches).
|
2967 |
(only inside phrase searches).
|
3000 |
|
2968 |
|
3001 |
* --with-file-command Specify the version of the 'file' command to use
|
2969 |
o --with-file-command Specify the version of the 'file' command to use
|
3002 |
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
2970 |
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
3003 |
the gnu version on systems where the native one is bad.
|
2971 |
the gnu version on systems where the native one is bad.
|
3004 |
|
2972 |
|
3005 |
* --disable-qtgui Disable the Qt interface. Will allow building the
|
2973 |
o --disable-qtgui Disable the Qt interface. Will allow building the
|
3006 |
indexer and the command line search program in absence of a Qt
|
2974 |
indexer and the command line search program in absence of a Qt
|
3007 |
environment.
|
2975 |
environment.
|
3008 |
|
2976 |
|
3009 |
* --disable-x11mon Disable X11 connection monitoring inside recollindex.
|
2977 |
o --disable-x11mon Disable X11 connection monitoring inside recollindex.
|
3010 |
Together with --disable-qtgui, this allows building recoll without Qt
|
2978 |
Together with --disable-qtgui, this allows building recoll without Qt
|
3011 |
and X11.
|
2979 |
and X11.
|
3012 |
|
2980 |
|
3013 |
* Of course the usual autoconf configure options, like --prefix apply.
|
2981 |
o Of course the usual autoconf configure options, like --prefix apply.
|
3014 |
|
2982 |
|
3015 |
Normal procedure:
|
2983 |
Normal procedure:
|
3016 |
|
2984 |
|
3017 |
cd recoll-xxx
|
2985 |
cd recoll-xxx
|
3018 |
configure
|
2986 |
configure
|
|
... |
|
... |
3023 |
There is little auto-configuration. The configure script will mainly link
|
2991 |
There is little auto-configuration. The configure script will mainly link
|
3024 |
one of the system-specific files in the mk directory to mk/sysconf. If
|
2992 |
one of the system-specific files in the mk directory to mk/sysconf. If
|
3025 |
your system is not known yet, it will tell you as much, and you may want
|
2993 |
your system is not known yet, it will tell you as much, and you may want
|
3026 |
to manually copy and modify one of the existing files (the new file name
|
2994 |
to manually copy and modify one of the existing files (the new file name
|
3027 |
should be the output of uname -s).
|
2995 |
should be the output of uname -s).
|
3028 |
|
|
|
3029 |
----------------------------------------------------------------------
|
|
|
3030 |
|
2996 |
|
3031 |
5.3.3. Installation
|
2997 |
5.3.3. Installation
|
3032 |
|
2998 |
|
3033 |
Either type make install or execute recollinstall prefix, in the root of
|
2999 |
Either type make install or execute recollinstall prefix, in the root of
|
3034 |
the source tree. This will copy the commands to prefix/bin and the sample
|
3000 |
the source tree. This will copy the commands to prefix/bin and the sample
|
|
... |
|
... |
3040 |
RECOLL_DATADIR environment variable to indicate where the shared data is
|
3006 |
RECOLL_DATADIR environment variable to indicate where the shared data is
|
3041 |
to be found (ie for (ba)sh: export
|
3007 |
to be found (ie for (ba)sh: export
|
3042 |
RECOLL_DATADIR=/some/path/share/recoll).
|
3008 |
RECOLL_DATADIR=/some/path/share/recoll).
|
3043 |
|
3009 |
|
3044 |
You can then proceed to configuration.
|
3010 |
You can then proceed to configuration.
|
3045 |
|
|
|
3046 |
----------------------------------------------------------------------
|
|
|
3047 |
|
3011 |
|
3048 |
5.4. Configuration overview
|
3012 |
5.4. Configuration overview
|
3049 |
|
3013 |
|
3050 |
Most of the parameters specific to the recoll GUI are set through the
|
3014 |
Most of the parameters specific to the recoll GUI are set through the
|
3051 |
Preferences menu and stored in the standard Qt place
|
3015 |
Preferences menu and stored in the standard Qt place
|
|
... |
|
... |
3096 |
defaultcharset = utf-8
|
3060 |
defaultcharset = utf-8
|
3097 |
|
3061 |
|
3098 |
|
3062 |
|
3099 |
There are three kinds of lines:
|
3063 |
There are three kinds of lines:
|
3100 |
|
3064 |
|
3101 |
* Comment (starts with #) or empty.
|
3065 |
o Comment (starts with #) or empty.
|
3102 |
|
3066 |
|
3103 |
* Parameter affectation (name = value).
|
3067 |
o Parameter affectation (name = value).
|
3104 |
|
3068 |
|
3105 |
* Section definition ([somedirname]).
|
3069 |
o Section definition ([somedirname]).
|
3106 |
|
3070 |
|
3107 |
Depending on the type of configuration file, section definitions either
|
3071 |
Depending on the type of configuration file, section definitions either
|
3108 |
separate groups of parameters or allow redefining some parameters for a
|
3072 |
separate groups of parameters or allow redefining some parameters for a
|
3109 |
directory sub-tree. They stay in effect until another section definition,
|
3073 |
directory sub-tree. They stay in effect until another section definition,
|
3110 |
or the end of file, is encountered. Some of the parameters used for
|
3074 |
or the end of file, is encountered. Some of the parameters used for
|
|
... |
|
... |
3119 |
embedded spaces can be quoted using double-quotes.
|
3083 |
embedded spaces can be quoted using double-quotes.
|
3120 |
|
3084 |
|
3121 |
Encoding issues. Most of the configuration parameters are plain ASCII. Two
|
3085 |
Encoding issues. Most of the configuration parameters are plain ASCII. Two
|
3122 |
particular sets of values may cause encoding issues:
|
3086 |
particular sets of values may cause encoding issues:
|
3123 |
|
3087 |
|
3124 |
* File path parameters may contain non-ascii characters and should use
|
3088 |
o File path parameters may contain non-ascii characters and should use
|
3125 |
the exact same byte values as found in the file system directory.
|
3089 |
the exact same byte values as found in the file system directory.
|
3126 |
Usually, this means that the configuration file should use the system
|
3090 |
Usually, this means that the configuration file should use the system
|
3127 |
default locale encoding.
|
3091 |
default locale encoding.
|
3128 |
|
3092 |
|
3129 |
* The unac_except_trans parameter should be encoded in UTF-8. If your
|
3093 |
o The unac_except_trans parameter should be encoded in UTF-8. If your
|
3130 |
system locale is not UTF-8, and you need to also specify non-ascii
|
3094 |
system locale is not UTF-8, and you need to also specify non-ascii
|
3131 |
file paths, this poses a difficulty because common text editors cannot
|
3095 |
file paths, this poses a difficulty because common text editors cannot
|
3132 |
handle multiple encodings in a single file. In this relatively
|
3096 |
handle multiple encodings in a single file. In this relatively
|
3133 |
unlikely case, you can edit the configuration file as two separate
|
3097 |
unlikely case, you can edit the configuration file as two separate
|
3134 |
text files with appropriate encodings, and concatenate them to create
|
3098 |
text files with appropriate encodings, and concatenate them to create
|
3135 |
the complete configuration.
|
3099 |
the complete configuration.
|
3136 |
|
3100 |
|
3137 |
----------------------------------------------------------------------
|
|
|
3138 |
|
|
|
3139 |
5.4.1. Main configuration file
|
3101 |
5.4.1. Main configuration file
|
3140 |
|
3102 |
|
3141 |
recoll.conf is the main configuration file. It defines things like what to
|
3103 |
recoll.conf is the main configuration file. It defines things like what to
|
3142 |
index (top directories and things to ignore), and the default character
|
3104 |
index (top directories and things to ignore), and the default character
|
3143 |
set to use for document types which do not specify it internally.
|
3105 |
set to use for document types which do not specify it internally.
|
|
... |
|
... |
3148 |
start the initial indexing, which may take some time.
|
3110 |
start the initial indexing, which may take some time.
|
3149 |
|
3111 |
|
3150 |
Most of the following parameters can be changed from the Index
|
3112 |
Most of the following parameters can be changed from the Index
|
3151 |
Configuration menu in the recoll interface. Some can only be set by
|
3113 |
Configuration menu in the recoll interface. Some can only be set by
|
3152 |
editing the configuration file.
|
3114 |
editing the configuration file.
|
3153 |
|
|
|
3154 |
----------------------------------------------------------------------
|
|
|
3155 |
|
3115 |
|
3156 |
5.4.1.1. Parameters affecting what documents we index:
|
3116 |
5.4.1.1. Parameters affecting what documents we index:
|
3157 |
|
3117 |
|
3158 |
topdirs
|
3118 |
topdirs
|
3159 |
|
3119 |
|
|
... |
|
... |
3202 |
indexed at startup, but not monitored.
|
3162 |
indexed at startup, but not monitored.
|
3203 |
|
3163 |
|
3204 |
Example of use for skipping text files only in a specific
|
3164 |
Example of use for skipping text files only in a specific
|
3205 |
directory:
|
3165 |
directory:
|
3206 |
|
3166 |
|
3207 |
skippedPaths = ~/somedir/..txt
|
3167 |
skippedPaths = ~/somedir/*.txt
|
3208 |
|
3168 |
|
3209 |
|
3169 |
|
3210 |
skippedPathsFnmPathname
|
3170 |
skippedPathsFnmPathname
|
3211 |
|
3171 |
|
3212 |
The values in the *skippedPaths variables are matched by default
|
3172 |
The values in the *skippedPaths variables are matched by default
|
|
... |
|
... |
3273 |
determining the mime type for a file (the main procedure uses
|
3233 |
determining the mime type for a file (the main procedure uses
|
3274 |
suffix associations as defined in the mimemap file). This can be
|
3234 |
suffix associations as defined in the mimemap file). This can be
|
3275 |
useful for files with suffix-less names, but it will also cause
|
3235 |
useful for files with suffix-less names, but it will also cause
|
3276 |
the indexing of many bogus "text" files.
|
3236 |
the indexing of many bogus "text" files.
|
3277 |
|
3237 |
|
3278 |
processbeaglequeue
|
3238 |
processwebqueue
|
3279 |
|
3239 |
|
3280 |
If this is set, process the directory where Beagle Web browser
|
3240 |
If this is set, process the directory where Web browser plugins
|
3281 |
plugins copy visited pages for indexing. Of course, Beagle MUST
|
3241 |
copy visited pages for indexing.
|
3282 |
NOT be running, else things will behave strangely.
|
|
|
3283 |
|
3242 |
|
3284 |
beaglequeuedir
|
3243 |
webqueuedir
|
3285 |
|
3244 |
|
3286 |
The path to the Beagle indexing queue. This is hard-coded in the
|
3245 |
The path to the web indexing queue. This is hard-coded in the
|
3287 |
Beagle plugin as ~/.beagle/ToIndex so there should be no need to
|
3246 |
Firefox plugin as ~/.recollweb/ToIndex so there should be no need
|
3288 |
change it.
|
3247 |
to change it.
|
3289 |
|
|
|
3290 |
----------------------------------------------------------------------
|
|
|
3291 |
|
3248 |
|
3292 |
5.4.1.2. Parameters affecting how we generate terms:
|
3249 |
5.4.1.2. Parameters affecting how we generate terms:
|
3293 |
|
3250 |
|
3294 |
Changing some of these parameters will imply a full reindex. Also, when
|
3251 |
Changing some of these parameters will imply a full reindex. Also, when
|
3295 |
using multiple indexes, it may not make sense to search indexes that don't
|
3252 |
using multiple indexes, it may not make sense to search indexes that don't
|
|
... |
|
... |
3405 |
are to be set, they should be separated with a colon (':')
|
3362 |
are to be set, they should be separated with a colon (':')
|
3406 |
character (which there is currently no way to escape). Ie:
|
3363 |
character (which there is currently no way to escape). Ie:
|
3407 |
localfields= rclaptg=gnus:other = val, then select specifier
|
3364 |
localfields= rclaptg=gnus:other = val, then select specifier
|
3408 |
viewer with mimetype|tag=... in mimeview.
|
3365 |
viewer with mimetype|tag=... in mimeview.
|
3409 |
|
3366 |
|
3410 |
----------------------------------------------------------------------
|
|
|
3411 |
|
|
|
3412 |
5.4.1.3. Parameters affecting where and how we store things:
|
3367 |
5.4.1.3. Parameters affecting where and how we store things:
|
3413 |
|
3368 |
|
3414 |
dbdir
|
3369 |
dbdir
|
3415 |
|
3370 |
|
3416 |
The name of the Xapian data directory. It will be created if
|
3371 |
The name of the Xapian data directory. It will be created if
|
|
... |
|
... |
3442 |
is really no sense in caching offsets for small files. The default
|
3397 |
is really no sense in caching offsets for small files. The default
|
3443 |
is 5 MB.
|
3398 |
is 5 MB.
|
3444 |
|
3399 |
|
3445 |
webcachedir
|
3400 |
webcachedir
|
3446 |
|
3401 |
|
3447 |
This is only used by the Beagle web browser plugin indexing code,
|
3402 |
This is only used by the web browser plugin indexing code, and
|
3448 |
and defines where the cache for visited pages will live. Default:
|
3403 |
defines where the cache for visited pages will live. Default:
|
3449 |
$RECOLL_CONFDIR/webcache
|
3404 |
$RECOLL_CONFDIR/webcache
|
3450 |
|
3405 |
|
3451 |
webcachemaxmbs
|
3406 |
webcachemaxmbs
|
3452 |
|
3407 |
|
3453 |
This is only used by the Beagle web browser plugin indexing code,
|
3408 |
This is only used by the web browser plugin indexing code, and
|
3454 |
and defines the maximum size for the web page cache. Default: 40
|
3409 |
defines the maximum size for the web page cache. Default: 40 MB.
|
3455 |
MB.
|
|
|
3456 |
|
3410 |
|
3457 |
idxflushmb
|
3411 |
idxflushmb
|
3458 |
|
3412 |
|
3459 |
Threshold (megabytes of new text data) where we flush from memory
|
3413 |
Threshold (megabytes of new text data) where we flush from memory
|
3460 |
to disk index. Setting this can help control memory usage. A value
|
3414 |
to disk index. Setting this can help control memory usage. A value
|
3461 |
of 0 means no explicit flushing, letting Xapian use its own
|
3415 |
of 0 means no explicit flushing, letting Xapian use its own
|
3462 |
default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
|
3416 |
default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
|
3463 |
documents, which gives little memory usage control, as memory
|
3417 |
documents, which gives little memory usage control, as memory
|
3464 |
usage depends on average document size. The default value is 10.
|
3418 |
usage also depends on average document size. The default value is
|
3465 |
|
3419 |
10, and it is probably a bit low. If your system usually has free
|
3466 |
----------------------------------------------------------------------
|
3420 |
memory, you can try higher values between 20 and 80. In my
|
|
|
3421 |
experience, values beyond 100 are always counterproductive.
|
3467 |
|
3422 |
|
3468 |
5.4.1.4. Miscellaneous parameters:
|
3423 |
5.4.1.4. Miscellaneous parameters:
|
3469 |
|
3424 |
|
3470 |
autodiacsens
|
3425 |
autodiacsens
|
3471 |
|
3426 |
|
|
... |
|
... |
3575 |
This allows definining location-related quirks for the mailbox
|
3530 |
This allows definining location-related quirks for the mailbox
|
3576 |
handler. Currently only the tbird flag is defined, and it should
|
3531 |
handler. Currently only the tbird flag is defined, and it should
|
3577 |
be set for directories which hold Thunderbird data, as their
|
3532 |
be set for directories which hold Thunderbird data, as their
|
3578 |
folder format is weird.
|
3533 |
folder format is weird.
|
3579 |
|
3534 |
|
3580 |
----------------------------------------------------------------------
|
|
|
3581 |
|
|
|
3582 |
5.4.2. The fields file
|
3535 |
5.4.2. The fields file
|
3583 |
|
3536 |
|
3584 |
This file contains information about dynamic fields handling in Recoll.
|
3537 |
This file contains information about dynamic fields handling in Recoll.
|
3585 |
Some very basic fields have hard-wired behaviour, and, mostly, you should
|
3538 |
Some very basic fields have hard-wired behaviour, and, mostly, you should
|
3586 |
not change the original data inside the fields file. But you can create
|
3539 |
not change the original data inside the fields file. But you can create
|
|
... |
|
... |
3636 |
|
3589 |
|
3637 |
[mail]
|
3590 |
[mail]
|
3638 |
# Extract the X-My-Tag mail header, and use it internally with the
|
3591 |
# Extract the X-My-Tag mail header, and use it internally with the
|
3639 |
# mailmytag field name
|
3592 |
# mailmytag field name
|
3640 |
x-my-tag = mailmytag
|
3593 |
x-my-tag = mailmytag
|
3641 |
|
|
|
3642 |
----------------------------------------------------------------------
|
|
|
3643 |
|
3594 |
|
3644 |
5.4.3. The mimemap file
|
3595 |
5.4.3. The mimemap file
|
3645 |
|
3596 |
|
3646 |
mimemap specifies the file name extension to mime type mappings.
|
3597 |
mimemap specifies the file name extension to mime type mappings.
|
3647 |
|
3598 |
|
|
... |
|
... |
3663 |
indexed (not even the file names are indexed for patterns in skippedNames.
|
3614 |
indexed (not even the file names are indexed for patterns in skippedNames.
|
3664 |
recoll_noindex is used mostly for things known to be unindexable by a
|
3615 |
recoll_noindex is used mostly for things known to be unindexable by a
|
3665 |
given Recoll version. Having it there avoids cluttering the more
|
3616 |
given Recoll version. Having it there avoids cluttering the more
|
3666 |
user-oriented and locally customized skippedNames.
|
3617 |
user-oriented and locally customized skippedNames.
|
3667 |
|
3618 |
|
3668 |
----------------------------------------------------------------------
|
|
|
3669 |
|
|
|
3670 |
5.4.4. The mimeconf file
|
3619 |
5.4.4. The mimeconf file
|
3671 |
|
3620 |
|
3672 |
mimeconf specifies how the different mime types are handled for indexing,
|
3621 |
mimeconf specifies how the different mime types are handled for indexing,
|
3673 |
and which icons are displayed in the recoll result lists.
|
3622 |
and which icons are displayed in the recoll result lists.
|
3674 |
|
3623 |
|
|
... |
|
... |
3676 |
except if you are a Recoll developer.
|
3625 |
except if you are a Recoll developer.
|
3677 |
|
3626 |
|
3678 |
The [icons] section allows you to change the icons which are displayed by
|
3627 |
The [icons] section allows you to change the icons which are displayed by
|
3679 |
recoll in the result lists (the values are the basenames of the png images
|
3628 |
recoll in the result lists (the values are the basenames of the png images
|
3680 |
inside the iconsdir directory (specified in recoll.conf).
|
3629 |
inside the iconsdir directory (specified in recoll.conf).
|
3681 |
|
|
|
3682 |
----------------------------------------------------------------------
|
|
|
3683 |
|
3630 |
|
3684 |
5.4.5. The mimeview file
|
3631 |
5.4.5. The mimeview file
|
3685 |
|
3632 |
|
3686 |
mimeview specifies which programs are started when you click on an Open
|
3633 |
mimeview specifies which programs are started when you click on an Open
|
3687 |
link in a result list. Ie: HTML is normally displayed using firefox, but
|
3634 |
link in a result list. Ie: HTML is normally displayed using firefox, but
|
|
... |
|
... |
3719 |
mydoc.doc.gz).
|
3666 |
mydoc.doc.gz).
|
3720 |
|
3667 |
|
3721 |
The right side of each assignment holds a command to be executed for
|
3668 |
The right side of each assignment holds a command to be executed for
|
3722 |
opening the file. The following substitutions are performed:
|
3669 |
opening the file. The following substitutions are performed:
|
3723 |
|
3670 |
|
3724 |
* %D. Document date
|
3671 |
o %D. Document date
|
3725 |
|
3672 |
|
3726 |
* %f. File name. This may be the name of a temporary file if it was
|
3673 |
o %f. File name. This may be the name of a temporary file if it was
|
3727 |
necessary to create one (ie: to extract a subdocument from a
|
3674 |
necessary to create one (ie: to extract a subdocument from a
|
3728 |
container).
|
3675 |
container).
|
3729 |
|
3676 |
|
3730 |
* %F. Original file name. Same as %f except if a temporary file is used.
|
3677 |
o %F. Original file name. Same as %f except if a temporary file is used.
|
3731 |
|
3678 |
|
3732 |
* %i. Internal path, for subdocuments of containers. The format depends
|
3679 |
o %i. Internal path, for subdocuments of containers. The format depends
|
3733 |
on the container type. If this appears in the command line, Recoll
|
3680 |
on the container type. If this appears in the command line, Recoll
|
3734 |
will not create a temporary file to extract the subdocument, expecting
|
3681 |
will not create a temporary file to extract the subdocument, expecting
|
3735 |
the called application (possibly a script) to be able to handle it.
|
3682 |
the called application (possibly a script) to be able to handle it.
|
3736 |
|
3683 |
|
3737 |
* %M. Mime type
|
3684 |
o %M. Mime type
|
3738 |
|
3685 |
|
3739 |
* %p. Page index. Only significant for a subset of document types,
|
3686 |
o %p. Page index. Only significant for a subset of document types,
|
3740 |
currently only PDF, Postscript and DVI files. Can be used to start the
|
3687 |
currently only PDF, Postscript and DVI files. Can be used to start the
|
3741 |
editor at the right page for a match or snippet.
|
3688 |
editor at the right page for a match or snippet.
|
3742 |
|
3689 |
|
3743 |
* %s. Search term. The value will only be set for documents with indexed
|
3690 |
o %s. Search term. The value will only be set for documents with indexed
|
3744 |
page numbers (ie: PDF). The value will be one of the matched search
|
3691 |
page numbers (ie: PDF). The value will be one of the matched search
|
3745 |
terms. It would allow pre-setting the value in the "Find" entry inside
|
3692 |
terms. It would allow pre-setting the value in the "Find" entry inside
|
3746 |
Evince for example, for easy highlighting of the term.
|
3693 |
Evince for example, for easy highlighting of the term.
|
3747 |
|
3694 |
|
3748 |
* %U, %u. Url.
|
3695 |
o %U, %u. Url.
|
3749 |
|
3696 |
|
3750 |
In addition to the predefined values above, all strings like %(fieldname)
|
3697 |
In addition to the predefined values above, all strings like %(fieldname)
|
3751 |
will be replaced by the value of the field named fieldname for the
|
3698 |
will be replaced by the value of the field named fieldname for the
|
3752 |
document. This could be used in combination with field customisation to
|
3699 |
document. This could be used in combination with field customisation to
|
3753 |
help with opening the document.
|
3700 |
help with opening the document.
|
3754 |
|
3701 |
|
3755 |
----------------------------------------------------------------------
|
|
|
3756 |
|
|
|
3757 |
5.4.6. Examples of configuration adjustments
|
3702 |
5.4.6. Examples of configuration adjustments
|
3758 |
|
3703 |
|
3759 |
5.4.6.1. Adding an external viewer for an non-indexed type
|
3704 |
5.4.6.1. Adding an external viewer for an non-indexed type
|
3760 |
|
3705 |
|
3761 |
Imagine that you have some kind of file which does not have indexable
|
3706 |
Imagine that you have some kind of file which does not have indexable
|
|
... |
|
... |
3763 |
the result list (when found by file name). The file names end in .blob and
|
3708 |
the result list (when found by file name). The file names end in .blob and
|
3764 |
can be displayed by application blobviewer.
|
3709 |
can be displayed by application blobviewer.
|
3765 |
|
3710 |
|
3766 |
You need two entries in the configuration files for this to work:
|
3711 |
You need two entries in the configuration files for this to work:
|
3767 |
|
3712 |
|
3768 |
* In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
3713 |
o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
3769 |
following line:
|
3714 |
following line:
|
3770 |
|
3715 |
|
3771 |
.blob = application/x-blobapp
|
3716 |
.blob = application/x-blobapp
|
3772 |
|
3717 |
|
3773 |
Note that the mime type is made up here, and you could call it
|
3718 |
Note that the mime type is made up here, and you could call it
|
3774 |
diesel/oil just the same.
|
3719 |
diesel/oil just the same.
|
|
|
3720 |
|
3775 |
* In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
3721 |
o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
3776 |
|
3722 |
|
3777 |
application/x-blobapp = blobviewer %f
|
3723 |
application/x-blobapp = blobviewer %f
|
3778 |
|
3724 |
|
3779 |
We are supposing that blobviewer wants a file name parameter here, you
|
3725 |
We are supposing that blobviewer wants a file name parameter here, you
|
3780 |
would use %u if it liked URLs better.
|
3726 |
would use %u if it liked URLs better.
|
|
... |
|
... |
3783 |
mime type which it already knows, you would just need to edit mimeview.
|
3729 |
mime type which it already knows, you would just need to edit mimeview.
|
3784 |
The entries you add in your personal file override those in the central
|
3730 |
The entries you add in your personal file override those in the central
|
3785 |
configuration, which you do not need to alter. mimeview can also be
|
3731 |
configuration, which you do not need to alter. mimeview can also be
|
3786 |
modified from the Gui.
|
3732 |
modified from the Gui.
|
3787 |
|
3733 |
|
3788 |
----------------------------------------------------------------------
|
|
|
3789 |
|
|
|
3790 |
5.4.6.2. Adding indexing support for a new file type
|
3734 |
5.4.6.2. Adding indexing support for a new file type
|
3791 |
|
3735 |
|
3792 |
Let us now imagine that the above .blob files actually contain indexable
|
3736 |
Let us now imagine that the above .blob files actually contain indexable
|
3793 |
text and that you know how to extract it with a command line program.
|
3737 |
text and that you know how to extract it with a command line program.
|
3794 |
Getting Recoll to index the files is easy. You need to perform the above
|
3738 |
Getting Recoll to index the files is easy. You need to perform the above
|
3795 |
alteration, and also to add data to the mimeconf file (typically in
|
3739 |
alteration, and also to add data to the mimeconf file (typically in
|
3796 |
~/.recoll/mimeconf):
|
3740 |
~/.recoll/mimeconf):
|
3797 |
|
3741 |
|
3798 |
* Under the [index] section, add the following line (more about the
|
3742 |
o Under the [index] section, add the following line (more about the
|
3799 |
rclblob indexing script later):
|
3743 |
rclblob indexing script later):
|
3800 |
|
3744 |
|
3801 |
application/x-blobapp = exec rclblob
|
3745 |
application/x-blobapp = exec rclblob
|
3802 |
|
3746 |
|
3803 |
* Under the [icons] section, you should choose an icon to be displayed
|
3747 |
o Under the [icons] section, you should choose an icon to be displayed
|
3804 |
for the files inside the result lists. Icons are normally 64x64 pixels
|
3748 |
for the files inside the result lists. Icons are normally 64x64 pixels
|
3805 |
PNG files which live in /usr/[local/]share/recoll/images.
|
3749 |
PNG files which live in /usr/[local/]share/recoll/images.
|
3806 |
|
3750 |
|
3807 |
* Under the [categories] section, you should add the mime type where it
|
3751 |
o Under the [categories] section, you should add the mime type where it
|
3808 |
makes sense (you can also create a category). Categories may be used
|
3752 |
makes sense (you can also create a category). Categories may be used
|
3809 |
for filtering in advanced search.
|
3753 |
for filtering in advanced search.
|
3810 |
|
3754 |
|
3811 |
The rclblob filter should be an executable program or script which exists
|
3755 |
The rclblob filter should be an executable program or script which exists
|
3812 |
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
3756 |
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
3813 |
argument and should output the text or html contents on the standard
|
3757 |
argument and should output the text or html contents on the standard
|
3814 |
output.
|
3758 |
output.
|
3815 |
|
3759 |
|
3816 |
The filter programming section describes in more detail how to write a
|
3760 |
The filter programming section describes in more detail how to write a
|
3817 |
filter.
|
3761 |
filter.
|
3818 |
|
|
|
3819 |
----------------------------------------------------------------------
|
|
|