|
a/src/README |
|
b/src/README |
|
... |
|
... |
35 |
|
35 |
|
36 |
2.2.1. Security aspects
|
36 |
2.2.1. Security aspects
|
37 |
|
37 |
|
38 |
2.3. The indexing configuration
|
38 |
2.3. The indexing configuration
|
39 |
|
39 |
|
40 |
2.4. Starting indexing
|
40 |
2.4. Periodic indexing
|
41 |
|
41 |
|
|
|
42 |
2.4.1. Starting indexing
|
|
|
43 |
|
42 |
2.5. Using cron to automate indexing
|
44 |
2.4.2. Using cron to automate indexing
|
|
|
45 |
|
|
|
46 |
2.5. Real time indexing
|
43 |
|
47 |
|
44 |
3. Search
|
48 |
3. Search
|
45 |
|
49 |
|
46 |
3.1. Simple search
|
50 |
3.1. Simple search
|
47 |
|
51 |
|
|
... |
|
... |
51 |
|
55 |
|
52 |
3.3. The preview window
|
56 |
3.3. The preview window
|
53 |
|
57 |
|
54 |
3.4. Complex/advanced search
|
58 |
3.4. Complex/advanced search
|
55 |
|
59 |
|
|
|
60 |
3.5. The term explorer tool
|
|
|
61 |
|
56 |
3.5. Multiple databases
|
62 |
3.6. Multiple databases
|
57 |
|
63 |
|
58 |
3.6. Document history
|
64 |
3.7. Document history
|
59 |
|
65 |
|
60 |
3.7. Sorting search results
|
66 |
3.8. Sorting search results
|
61 |
|
67 |
|
62 |
3.8. Search tips, shortcuts
|
68 |
3.9. Search tips, shortcuts
|
63 |
|
69 |
|
64 |
3.9. Customizing the search interface
|
70 |
3.10. Customizing the search interface
|
65 |
|
71 |
|
66 |
4. Installation
|
72 |
4. Installation
|
67 |
|
73 |
|
68 |
4.1. Installing a prebuilt copy
|
74 |
4.1. Installing a prebuilt copy
|
69 |
|
75 |
|
70 |
4.1.1. Installing through a package system
|
76 |
4.1.1. Installing through a package system
|
71 |
|
77 |
|
72 |
4.1.2. Installing a prebuilt Recoll
|
78 |
4.1.2. Installing a prebuilt Recoll
|
73 |
|
79 |
|
74 |
4.2. Packages needed for external file types
|
80 |
4.2. Supporting packages
|
75 |
|
81 |
|
76 |
4.3. Building from source
|
82 |
4.3. Building from source
|
77 |
|
83 |
|
78 |
4.3.1. Prerequisites
|
84 |
4.3.1. Prerequisites
|
79 |
|
85 |
|
|
... |
|
... |
138 |
(example: floor, floors, floored, flooring...). Recoll will by default
|
144 |
(example: floor, floors, floored, flooring...). Recoll will by default
|
139 |
expand queries to all such related terms (words that reduce to the same
|
145 |
expand queries to all such related terms (words that reduce to the same
|
140 |
stem). This expansion can be disabled at search time.
|
146 |
stem). This expansion can be disabled at search time.
|
141 |
|
147 |
|
142 |
Stemming, by itself, does not accommodate for misspellings or phonetic
|
148 |
Stemming, by itself, does not accommodate for misspellings or phonetic
|
143 |
searches. Recoll currently does not support these features.
|
149 |
searches. Recoll supports these features through a specific tool (the term
|
|
|
150 |
explorer) which will let you explore the set of index terms along
|
|
|
151 |
different modes.
|
144 |
|
152 |
|
145 |
----------------------------------------------------------------------
|
153 |
----------------------------------------------------------------------
|
146 |
|
154 |
|
147 |
1.3. Recoll overview
|
155 |
1.3. Recoll overview
|
148 |
|
156 |
|
|
... |
|
... |
200 |
documents will only be processed if they have been modified. On the first
|
208 |
documents will only be processed if they have been modified. On the first
|
201 |
execution, of course, all documents will need processing. A full index
|
209 |
execution, of course, all documents will need processing. A full index
|
202 |
build can be forced later on by specifying an option to the indexing
|
210 |
build can be forced later on by specifying an option to the indexing
|
203 |
command (recollindex -z).
|
211 |
command (recollindex -z).
|
204 |
|
212 |
|
205 |
Recoll indexing takes place at discrete times. There is currently no
|
213 |
Recoll indexing can be performed with two different methods:
|
206 |
interface to real time file modification monitors. The typical usage is to
|
214 |
|
|
|
215 |
* Periodic indexing: indexing takes place at discrete times, by
|
|
|
216 |
executing the recollindex command. The typical usage is to have a
|
207 |
have a nightly indexing run programmed into your cron file.
|
217 |
nightly indexing run programmed into your cron file.
|
208 |
|
218 |
|
209 |
+------------------------------------------------------------------------+
|
219 |
* Real time indexing: indexing takes place as soon as a file is created
|
210 |
| There is nothing in Recoll and Xapian that would prevent interfacing |
|
220 |
or changed. recollindex runs as a daemon and uses a file system
|
211 |
| with a real time file modification monitor, but this would tend to |
|
221 |
alteration monitor such as Fam, Gamin or inotify do detect file
|
|
|
222 |
changes. Monitoring a big directory tree can consume significant
|
|
|
223 |
system resources.
|
|
|
224 |
|
|
|
225 |
The choice between the two methods is mostly a matter of preference, and
|
|
|
226 |
they can be combined by setting up multiple indexes (ie: use periodic
|
|
|
227 |
indexing on a big documentation directory, and real time indexing on a
|
|
|
228 |
small home directory). Monitoring a big file system tree can consume
|
212 |
| consume significant system resources for dubious gain, because you |
|
229 |
significant system resources, for dubious gains.
|
213 |
| rarely need a full text search to find documents you just modified. |
|
230 |
|
214 |
| recollindex -i can be used to add individual files to the index if you |
|
231 |
|
215 |
| want to play with this, see the manual page. |
|
|
|
216 |
+------------------------------------------------------------------------+
|
|
|
217 |
|
232 |
|
218 |
Recoll knows about quite a few different document types. The parameters
|
233 |
Recoll knows about quite a few different document types. The parameters
|
219 |
for document types recognition and processing are set in configuration
|
234 |
for document types recognition and processing are set in configuration
|
220 |
files Most file types, like HTML or word processing files, only hold one
|
235 |
files Most file types, like HTML or word processing files, only hold one
|
221 |
document. Some file types, like mail folder files can hold many
|
236 |
document. Some file types, like mail folder files can hold many
|
|
... |
|
... |
229 |
from your home directory, with a reasonable set of defaults.
|
244 |
from your home directory, with a reasonable set of defaults.
|
230 |
|
245 |
|
231 |
In some cases, it may be interesting to index different areas of the file
|
246 |
In some cases, it may be interesting to index different areas of the file
|
232 |
system to separate databases. You can do this by using multiple
|
247 |
system to separate databases. You can do this by using multiple
|
233 |
configuration directories, each indexing a file system area to a specific
|
248 |
configuration directories, each indexing a file system area to a specific
|
234 |
database. You would use the RECOLL_CONFDIR environment variable or the -c
|
249 |
database. See the section about using multiple databases for more
|
235 |
confdir option to recollindex to indicate which configuration to process.
|
250 |
information on multiple configurations and indexes.
|
236 |
The recoll search program can use any selection of the existing databases
|
|
|
237 |
for each search, this is configurable inside the user interface.
|
|
|
238 |
|
251 |
|
239 |
----------------------------------------------------------------------
|
252 |
----------------------------------------------------------------------
|
240 |
|
253 |
|
241 |
2.2. Index storage
|
254 |
2.2. Index storage
|
242 |
|
255 |
|
243 |
The default location for the index data is the $HOME/.recoll/xapiandb/
|
256 |
The default location for the index data is the xapiandb subdirectory of
|
244 |
directory. This can be changed by setting the RECOLL_CONFDIR environment
|
257 |
the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
|
245 |
variable, or by specifying the dbdir parameter in the configuration file
|
258 |
This can be changed via two different methods (with different purposes):
|
246 |
(see the configuration section).
|
259 |
|
|
|
260 |
* You can specify a different configuration directory by setting the
|
|
|
261 |
RECOLL_CONFDIR environment variable, or using the -c option to the
|
|
|
262 |
Recoll commands. This method would typically be used to index
|
|
|
263 |
different areas of the file system to different indexes. For example,
|
|
|
264 |
if you were to issue the following commands:
|
|
|
265 |
|
|
|
266 |
export RECOLL_CONFDIR=~/.indexes-email
|
|
|
267 |
recoll
|
|
|
268 |
|
|
|
269 |
|
|
|
270 |
Then Recoll would use configuration files stored in ~/.indexes-email/
|
|
|
271 |
and, (unless specified otherwise in recoll.conf) would look for the
|
|
|
272 |
index in ~/.indexes-email/xapiandb/.
|
|
|
273 |
|
|
|
274 |
Using multiple configuration directories and configuration options
|
|
|
275 |
allows you to tailor multiple configurations and indexes to handle
|
|
|
276 |
whatever subset of the available data that you wish to make
|
|
|
277 |
searchable.
|
|
|
278 |
|
|
|
279 |
* You can also specify a different storage location for the index by
|
|
|
280 |
setting the dbdir parameter in the configuration file (see the
|
|
|
281 |
configuration section). This method would mainly be of use if you
|
|
|
282 |
wanted to keep the configuration directory in its default location,
|
|
|
283 |
but desired another location for the index, typically out of disk
|
|
|
284 |
occupation concerns.
|
247 |
|
285 |
|
248 |
The size of the index is determined by the size of the set of documents,
|
286 |
The size of the index is determined by the size of the set of documents,
|
249 |
but the ratio can vary a lot. For a typical mixed set of documents, the
|
287 |
but the ratio can vary a lot. For a typical mixed set of documents, the
|
250 |
index size will often be close to the data set size. In specific cases (a
|
288 |
index size will often be close to the data set size. In specific cases (a
|
251 |
set of compressed mbox files for example), the index can become much
|
289 |
set of compressed mbox files for example), the index can become much
|
|
... |
|
... |
255 |
|
293 |
|
256 |
Of course, images, sound and video do not increase the index size, which
|
294 |
Of course, images, sound and video do not increase the index size, which
|
257 |
means that it will be quite typical nowadays (2006), that even a big index
|
295 |
means that it will be quite typical nowadays (2006), that even a big index
|
258 |
will be negligible against the total amount of data on the computer.
|
296 |
will be negligible against the total amount of data on the computer.
|
259 |
|
297 |
|
260 |
The index data directory (xapiandb) only contains data that will be
|
298 |
The index data directory (xapiandb) only contains data that can be
|
261 |
rebuilt by an index run, and it can always be destroyed safely.
|
299 |
completely rebuilt by an index run, and it can always be destroyed safely.
|
262 |
|
300 |
|
263 |
----------------------------------------------------------------------
|
301 |
----------------------------------------------------------------------
|
264 |
|
302 |
|
265 |
2.2.1. Security aspects
|
303 |
2.2.1. Security aspects
|
266 |
|
304 |
|
|
... |
|
... |
280 |
|
318 |
|
281 |
----------------------------------------------------------------------
|
319 |
----------------------------------------------------------------------
|
282 |
|
320 |
|
283 |
2.3. The indexing configuration
|
321 |
2.3. The indexing configuration
|
284 |
|
322 |
|
285 |
Values set in the system-wide configuration file (named like
|
323 |
You can control which areas of the file system are indexed, and how files
|
286 |
/usr/[local/]share/recoll/examples/recoll.conf) can be overridden by those
|
324 |
are processed, by setting variables inside the Recoll configuration files.
|
287 |
set in the personal one, named $HOME/.recoll/recoll.conf by default or
|
|
|
288 |
$RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set.
|
|
|
289 |
|
325 |
|
290 |
The most accurate documentation for editing the file is given by comments
|
326 |
You can also use multiple indexes defined by separate configurations,
|
291 |
inside the central one. If you want to adjust the configuration before
|
327 |
typically to separate personal and shared indexes, or to take advantage of
|
292 |
indexing, just click Cancel when the program asks if it should start
|
328 |
the organization of your data to improve search precision.
|
|
|
329 |
|
|
|
330 |
The first time you start recoll, you will be asked whether or not you
|
|
|
331 |
would like recoll to build the index. If you want to adjust the
|
|
|
332 |
configuration before indexing, just click Cancel at this point. That way,
|
293 |
initial indexing. This will have created a .recoll directory containing
|
333 |
recoll will have created a ~/.recoll directory containing empty
|
294 |
empty configuration files.
|
334 |
configuration files.
|
295 |
|
335 |
|
296 |
The configuration is also documented inside the installation chapter of
|
336 |
The configuration is documented inside the installation chapter of this
|
297 |
this document, or in the recoll.conf(5) man page.
|
337 |
document, or in the recoll.conf(5) man page. The most immediately useful
|
|
|
338 |
variable you may interested in is probably topdirs, which determines what
|
|
|
339 |
subtrees get indexed.
|
298 |
|
340 |
|
299 |
The applications needed to index file types other than text, HTML or email
|
341 |
The applications needed to index file types other than text, HTML or email
|
300 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
342 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
301 |
section
|
343 |
section
|
302 |
|
344 |
|
303 |
----------------------------------------------------------------------
|
345 |
----------------------------------------------------------------------
|
304 |
|
346 |
|
|
|
347 |
2.4. Periodic indexing
|
|
|
348 |
|
305 |
2.4. Starting indexing
|
349 |
2.4.1. Starting indexing
|
306 |
|
350 |
|
307 |
Indexing is performed either by the recollindex program, or by the
|
351 |
Indexing is performed either by the recollindex program, or by the
|
308 |
indexing thread inside the recoll program (use the File menu). Both
|
352 |
indexing thread inside the recoll program (use the File menu). Both
|
309 |
programs will use of the RECOLL_CONFDIR variable or accept a -c confdir
|
353 |
programs will use of the RECOLL_CONFDIR variable or accept a -c confdir
|
310 |
option to specify the configuration directory to be used.
|
354 |
option to specify the configuration directory to be used.
|
|
... |
|
... |
312 |
If the recoll program finds no index when it starts, it will automatically
|
356 |
If the recoll program finds no index when it starts, it will automatically
|
313 |
start indexing (except if canceled).
|
357 |
start indexing (except if canceled).
|
314 |
|
358 |
|
315 |
It is best to avoid interrupting the indexing process, as this may
|
359 |
It is best to avoid interrupting the indexing process, as this may
|
316 |
sometimes leave the index in a bad state. This is not a serious problem,
|
360 |
sometimes leave the index in a bad state. This is not a serious problem,
|
317 |
as you then just need to clear everything and restart the indexing: the
|
361 |
as you then just need to delete the index files and restart the indexing.
|
318 |
index files are normally stored in the $HOME/.recoll/xapiandb directory,
|
362 |
The index files are normally stored in the $HOME/.recoll/xapiandb
|
319 |
which you can just delete if needed. Alternatively, you can start
|
363 |
directory, which you can just delete if needed. Alternatively, you can
|
320 |
recollindex with option -z, which will reset the database before indexing.
|
364 |
start recollindex with option -z, which will reset the database before
|
|
|
365 |
indexing.
|
321 |
|
366 |
|
322 |
----------------------------------------------------------------------
|
367 |
----------------------------------------------------------------------
|
323 |
|
368 |
|
324 |
2.5. Using cron to automate indexing
|
369 |
2.4.2. Using cron to automate indexing
|
325 |
|
370 |
|
326 |
The most common way to set up indexing is to have a cron task execute it
|
371 |
The most common way to set up indexing is to have a cron task execute it
|
327 |
every night. For example the following crontab entry would do it every day
|
372 |
every night. For example the following crontab entry would do it every day
|
328 |
at 3:30AM (supposing recollindex is in your PATH):
|
373 |
at 3:30AM (supposing recollindex is in your PATH):
|
329 |
|
374 |
|
330 |
30 3 * * * recollindex > /tmp/recolltrace 2>&1
|
375 |
30 3 * * * recollindex > /tmp/recolltrace 2>&1
|
331 |
|
376 |
|
332 |
The usual command to edit your crontab is crontab -e (which will usually
|
377 |
The usual command to edit your crontab is crontab -e (which will usually
|
333 |
start the vi editor to edit the file). You may have more sophisticated
|
378 |
start the vi editor to edit the file). You may have more sophisticated
|
334 |
tools available on your system.
|
379 |
tools available on your system.
|
|
|
380 |
|
|
|
381 |
----------------------------------------------------------------------
|
|
|
382 |
|
|
|
383 |
2.5. Real time indexing
|
|
|
384 |
|
|
|
385 |
Real time monitoring/indexing is performed by starting the recollindex -m
|
|
|
386 |
command. With this option, recollindex will detach from the terminal and
|
|
|
387 |
become a daemon, forever monitoring file changes and updating the index.
|
|
|
388 |
|
|
|
389 |
The package must have been configured with option --with-fam or
|
|
|
390 |
--with-inotify for the monitoring code and option to be enabled in
|
|
|
391 |
recollindex. This is not currently the default.
|
|
|
392 |
|
|
|
393 |
The rclmon.sh script can be used to easily start and stop the daemon. It
|
|
|
394 |
can be found in the examples directory (typically
|
|
|
395 |
/usr/local/[share/]recoll/examples).
|
|
|
396 |
|
|
|
397 |
Starting and stopping the daemon could be performed, for example, as part
|
|
|
398 |
of the user session script. For example, my out of fashion xdm-based
|
|
|
399 |
session has an .xsession script with the following lines at the end:
|
|
|
400 |
|
|
|
401 |
recollconf=$HOME/.recoll-home
|
|
|
402 |
recolldata=/usr/local/share/recoll
|
|
|
403 |
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
|
|
404 |
|
|
|
405 |
fvwm
|
|
|
406 |
|
|
|
407 |
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh stop
|
|
|
408 |
|
|
|
409 |
The indexing daemon gets started, then the window manager, for which the
|
|
|
410 |
session waits. When the window manager exits, the indexing daemon is
|
|
|
411 |
stopped, then the session ends (at script exit). This should be adjusted
|
|
|
412 |
for your flavour of session management, and of course, there are other
|
|
|
413 |
possibilities.
|
|
|
414 |
|
|
|
415 |
By default, the indexing daemon will write its messages to a file inside
|
|
|
416 |
the configuration directory (this is controlled by the daemlogfilename and
|
|
|
417 |
daemloglevel configuration parameters). You may want to change this. Also
|
|
|
418 |
the log file will only be truncated when the daemon starts. If the daemon
|
|
|
419 |
runs permanently, the log file may grow quite big, depending on the log
|
|
|
420 |
level.
|
|
|
421 |
|
|
|
422 |
The real time indexing code is relatively young, and there are still a few
|
|
|
423 |
quirks. File deletions occurring while the monitor is not running will not
|
|
|
424 |
be detected. You'll have to run a normal incremental indexing pass from
|
|
|
425 |
time to time to purge the database. There may still be other problems.
|
335 |
|
426 |
|
336 |
----------------------------------------------------------------------
|
427 |
----------------------------------------------------------------------
|
337 |
|
428 |
|
338 |
Chapter 3. Search
|
429 |
Chapter 3. Search
|
339 |
|
430 |
|
|
... |
|
... |
370 |
Recoll remembers the last few searches that you performed. You can use the
|
461 |
Recoll remembers the last few searches that you performed. You can use the
|
371 |
simple search text entry widget (a combobox) to recall them (click on the
|
462 |
simple search text entry widget (a combobox) to recall them (click on the
|
372 |
thing at the right of the text field). Please note, however, that only the
|
463 |
thing at the right of the text field). Please note, however, that only the
|
373 |
search texts are remembered, not the mode (all/any/file name).
|
464 |
search texts are remembered, not the mode (all/any/file name).
|
374 |
|
465 |
|
375 |
Hitting ^Tab (Ctrl + Tab) while entering a word in the simple search entry
|
466 |
Typing Esc Space) while entering a word in the simple search entry will
|
376 |
will open a window with possible completions for the word. The completions
|
467 |
open a window with possible completions for the word. The completions are
|
377 |
are extracted from the database.
|
468 |
extracted from the database.
|
378 |
|
469 |
|
379 |
Double-clicking on a word in the result list or a preview window will
|
470 |
Double-clicking on a word in the result list or a preview window will
|
380 |
insert it into the simple search entry field.
|
471 |
insert it into the simple search entry field.
|
381 |
|
472 |
|
382 |
You can use the Tools / Advanced search dialog for more complex searches.
|
473 |
You can use the Tools / Advanced search dialog for more complex searches.
|
|
... |
|
... |
391 |
By default, the document list is presented in order of relevance (how well
|
482 |
By default, the document list is presented in order of relevance (how well
|
392 |
the system estimates that the document matches the query). You can specify
|
483 |
the system estimates that the document matches the query). You can specify
|
393 |
a different ordering by using the Tools / Sort parameters dialog.
|
484 |
a different ordering by using the Tools / Sort parameters dialog.
|
394 |
|
485 |
|
395 |
Clicking on the Preview link for an entry will open an internal preview
|
486 |
Clicking on the Preview link for an entry will open an internal preview
|
396 |
window for the document. Clicking the Edit link will attempt to start an
|
487 |
window for the document. Further Preview clicks for the same search will
|
397 |
external viewer (have a look at the mimeconf configuration file to see how
|
488 |
open tabs in the existing preview window. You can use Shift+Click to force
|
398 |
these are configured).
|
489 |
the creation of another preview window, which may be useful to view the
|
|
|
490 |
documents side by side.
|
|
|
491 |
|
|
|
492 |
Clicking the Edit link will attempt to start an external viewer (have a
|
|
|
493 |
look at the mimeconf configuration file to see how these are configured).
|
399 |
|
494 |
|
400 |
The Preview and Edit edit links may not be present for all entries,
|
495 |
The Preview and Edit edit links may not be present for all entries,
|
401 |
meaning that Recoll has no configured way to preview a given file type
|
496 |
meaning that Recoll has no configured way to preview a given file type
|
402 |
(which was indexed by name only), or no configured external viewer for the
|
497 |
(which was indexed by name only), or no configured external viewer for the
|
403 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
498 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
|
... |
|
... |
479 |
----------------------------------------------------------------------
|
574 |
----------------------------------------------------------------------
|
480 |
|
575 |
|
481 |
3.4. Complex/advanced search
|
576 |
3.4. Complex/advanced search
|
482 |
|
577 |
|
483 |
The advanced search dialog has fields that will allow a more refined
|
578 |
The advanced search dialog has fields that will allow a more refined
|
484 |
search, looking for documents with all given elements, a given exact
|
579 |
search. It has a number of entry fields, each of which is configurable for
|
485 |
phrase, none of the given elements, or a given file name (with wildcard
|
580 |
the following modes:
|
|
|
581 |
|
|
|
582 |
* All terms.
|
|
|
583 |
|
|
|
584 |
* Any term.
|
|
|
585 |
|
|
|
586 |
* None of the terms.
|
|
|
587 |
|
|
|
588 |
* Phrase (exact terms in order within an adjustable window).
|
|
|
589 |
|
|
|
590 |
* Proximity (terms in any order within an adjustable window).
|
|
|
591 |
|
|
|
592 |
* Filename search with wildcards.
|
|
|
593 |
|
|
|
594 |
Additional entry fields can be created by clicking the Add clause button.
|
|
|
595 |
|
486 |
expansion). All relevant fields will be combined by an implicit AND
|
596 |
All relevant fields will be combined by an implicit AND or OR conjunction.
|
487 |
clause. All fields except "Exact phrase" can accept a mix of single words
|
597 |
All types of clauses except "phrase" and "near" can accept a mix of single
|
488 |
and phrases enclosed in double quotes.
|
598 |
words and phrases enclosed in double quotes. Stemming expansion will be
|
|
|
599 |
performed for all terms not beginning with a capital letter, except for
|
|
|
600 |
"phrase" clauses.
|
489 |
|
601 |
|
490 |
Advanced search will let you search for documents of specific mime types
|
602 |
Advanced search will also let you search for documents of specific mime
|
491 |
(ie: only text/plain, or text/HTML or application/pdf etc...). The state
|
603 |
types (ie: only text/plain, or text/HTML or application/pdf etc...). The
|
492 |
of the file type selection can be saved as the default (the file type
|
604 |
state of the file type selection can be saved as the default (the file
|
493 |
filter will not be activated at program start-up, but the lists will be in
|
605 |
type filter will not be activated at program start-up, but the lists will
|
494 |
the restored state).
|
606 |
be in the restored state).
|
495 |
|
607 |
|
496 |
You can also restrict the search results to a sub-tree of the indexed
|
608 |
You can also restrict the search results to a sub-tree of the indexed
|
497 |
area. If you need to do this often, you may think of setting up multiple
|
609 |
area. If you need to do this often, you may think of setting up multiple
|
498 |
indexes instead, as the performance will be much better.
|
610 |
indexes instead, as the performance will be much better.
|
499 |
|
611 |
|
|
... |
|
... |
504 |
Click on the Show query details link at the top of the result page to see
|
616 |
Click on the Show query details link at the top of the result page to see
|
505 |
the query expansion.
|
617 |
the query expansion.
|
506 |
|
618 |
|
507 |
----------------------------------------------------------------------
|
619 |
----------------------------------------------------------------------
|
508 |
|
620 |
|
|
|
621 |
3.5. The term explorer tool
|
|
|
622 |
|
|
|
623 |
Recoll automatically manages the expansion of search terms to their
|
|
|
624 |
derivatives (ie: plural/singular, verb inflections). But there are other
|
|
|
625 |
cases where the exact search term is not known. For example, you may not
|
|
|
626 |
remember the exact spelling, or only know the beginning of the name.
|
|
|
627 |
|
|
|
628 |
The term explorer tool (started from the toolbar icon or from the Term
|
|
|
629 |
explorer entry of the Tools menu) can be used to search the full index
|
|
|
630 |
terms list. It has three modes of operations:
|
|
|
631 |
|
|
|
632 |
Wildcard
|
|
|
633 |
|
|
|
634 |
In this mode of operation, you can enter a search string with
|
|
|
635 |
shell-like wildcards (*, ?). ie: xapi* .
|
|
|
636 |
|
|
|
637 |
Regular expression
|
|
|
638 |
|
|
|
639 |
This mode will accept a regular expression as input. Example:
|
|
|
640 |
word[0-9]+ . The regular expression is anchored by enclosing in ^
|
|
|
641 |
and $ before execution.
|
|
|
642 |
|
|
|
643 |
Stem expansion
|
|
|
644 |
|
|
|
645 |
This mode will perform the usual stem expansion normally done as
|
|
|
646 |
part user input processing. As such it is probably mostly useful
|
|
|
647 |
to demonstrate the process.
|
|
|
648 |
|
|
|
649 |
Spelling/Phonetic
|
|
|
650 |
|
|
|
651 |
In this mode, you enter the term as you think it is spelled, and
|
|
|
652 |
Recoll will do its best to find index terms that sound like your
|
|
|
653 |
entry. This mode uses the Aspell spelling application, which must
|
|
|
654 |
be installed on your system for things to work. The language which
|
|
|
655 |
is used to build the dictionary out of the index terms (which is
|
|
|
656 |
done at the end of an indexing pass) is the one defined by your
|
|
|
657 |
NLS environment. Weird things will probably happen if languages
|
|
|
658 |
are mixed up.
|
|
|
659 |
|
|
|
660 |
Note that in cases where Recoll does not know the beginning of the string
|
|
|
661 |
to search for (ie a wildcard expression like *coll), the expansion can
|
|
|
662 |
take quite a long time because the full index term list will have to be
|
|
|
663 |
processed. The expansion is currently limited at 200 results for wildcards
|
|
|
664 |
and regular expressions.
|
|
|
665 |
|
|
|
666 |
Double-clicking on a term in the result list will insert it into the
|
|
|
667 |
simple search entry field. You can also cut/paste between the result list
|
|
|
668 |
and any entry field (the end of lines will be taken care of).
|
|
|
669 |
|
|
|
670 |
----------------------------------------------------------------------
|
|
|
671 |
|
509 |
3.5. Multiple databases
|
672 |
3.6. Multiple databases
|
510 |
|
673 |
|
511 |
Multiple Recoll databases or indexes can be created by using several
|
674 |
Multiple Recoll databases or indexes can be created by using several
|
512 |
configuration directories which are usually set to index different areas
|
675 |
configuration directories which are usually set to index different areas
|
513 |
of the file system. A specific index can be selected for updating or
|
676 |
of the file system. A specific index can be selected for updating or
|
514 |
searching, using the RECOLL_CONFDIR environment variable or the -c option
|
677 |
searching, using the RECOLL_CONFDIR environment variable or the -c option
|
|
... |
|
... |
550 |
search, but multiple indexes will have much better performance and may be
|
713 |
search, but multiple indexes will have much better performance and may be
|
551 |
worth the trouble.
|
714 |
worth the trouble.
|
552 |
|
715 |
|
553 |
----------------------------------------------------------------------
|
716 |
----------------------------------------------------------------------
|
554 |
|
717 |
|
555 |
3.6. Document history
|
718 |
3.7. Document history
|
556 |
|
719 |
|
557 |
Documents that you actually view (with the internal preview or an external
|
720 |
Documents that you actually view (with the internal preview or an external
|
558 |
tool) are entered into the document history, which is remembered. You can
|
721 |
tool) are entered into the document history, which is remembered. You can
|
559 |
display the history list by using the Tools/Doc History menu entry.
|
722 |
display the history list by using the Tools/Doc History menu entry.
|
560 |
|
723 |
|
561 |
----------------------------------------------------------------------
|
724 |
----------------------------------------------------------------------
|
562 |
|
725 |
|
563 |
3.7. Sorting search results
|
726 |
3.8. Sorting search results
|
564 |
|
727 |
|
565 |
The documents in a result list are normally sorted in order of relevance.
|
728 |
The documents in a result list are normally sorted in order of relevance.
|
566 |
It is possible to specify different sort parameters by using the Sort
|
729 |
It is possible to specify different sort parameters by using the Sort
|
567 |
parameters dialog (located in the Tools menu).
|
730 |
parameters dialog (located in the Tools menu).
|
568 |
|
731 |
|
|
... |
|
... |
573 |
The sort parameters stay in effect until they are explicitly reset, or the
|
736 |
The sort parameters stay in effect until they are explicitly reset, or the
|
574 |
program exits. An activated sort is indicated in the result list header.
|
737 |
program exits. An activated sort is indicated in the result list header.
|
575 |
|
738 |
|
576 |
----------------------------------------------------------------------
|
739 |
----------------------------------------------------------------------
|
577 |
|
740 |
|
578 |
3.8. Search tips, shortcuts
|
741 |
3.9. Search tips, shortcuts
|
579 |
|
742 |
|
580 |
Term completion. Typing ^TAB (Control + Tab) in the simple search entry
|
743 |
Term completion. Typing Esc Space in the simple search entry field while
|
581 |
field while entering a word will either complete the current word if its
|
744 |
entering a word will either complete the current word if its beginning
|
582 |
beginning matches a unique term in the index, or open a window to propose
|
745 |
matches a unique term in the index, or open a window to propose a list of
|
583 |
a list of completions
|
746 |
completions.
|
584 |
|
747 |
|
585 |
Picking up new terms from result or preview text. Double-clicking on a
|
748 |
Picking up new terms from result or preview text. Double-clicking on a
|
586 |
word in the result list or in a preview window will copy it to the simple
|
749 |
word in the result list or in a preview window will copy it to the simple
|
587 |
search entry field.
|
750 |
search entry field.
|
588 |
|
751 |
|
|
... |
|
... |
601 |
|
764 |
|
602 |
Browsing the result list inside a preview window (1.5). Entering
|
765 |
Browsing the result list inside a preview window (1.5). Entering
|
603 |
Shift-Down or Shift-Up (Shift + an arrow key) in a preview window will
|
766 |
Shift-Down or Shift-Up (Shift + an arrow key) in a preview window will
|
604 |
display the next or the previous document from the result list. Any
|
767 |
display the next or the previous document from the result list. Any
|
605 |
secondary search currently active will be executed on the new document.
|
768 |
secondary search currently active will be executed on the new document.
|
|
|
769 |
|
|
|
770 |
Forced opening of a preview window (1.6). You can use Shift+Click on a
|
|
|
771 |
result list Preview link to force the creation of a preview window instead
|
|
|
772 |
of a new tab in the existing one.
|
606 |
|
773 |
|
607 |
AutoPhrases (1.5). This option can be set in the preferences dialog. If it
|
774 |
AutoPhrases (1.5). This option can be set in the preferences dialog. If it
|
608 |
is set, a phrase will be automatically built and added to simple searches
|
775 |
is set, a phrase will be automatically built and added to simple searches
|
609 |
when looking for Any terms. This will not change radically the results,
|
776 |
when looking for Any terms. This will not change radically the results,
|
610 |
but will give a relevance boost to the results where the search terms
|
777 |
but will give a relevance boost to the results where the search terms
|
|
... |
|
... |
635 |
|
802 |
|
636 |
Quitting. Entering ^Q almost anywhere will close the application.
|
803 |
Quitting. Entering ^Q almost anywhere will close the application.
|
637 |
|
804 |
|
638 |
----------------------------------------------------------------------
|
805 |
----------------------------------------------------------------------
|
639 |
|
806 |
|
640 |
3.9. Customizing the search interface
|
807 |
3.10. Customizing the search interface
|
641 |
|
808 |
|
642 |
It is possible to customize some aspects of the search interface by using
|
809 |
It is possible to customize some aspects of the search interface by using
|
643 |
Query configuration entry in the Preferences menu.
|
810 |
Query configuration entry in the Preferences menu.
|
644 |
|
811 |
|
645 |
There are two tabs in the dialog, dealing with the interface itself, and
|
812 |
There are two tabs in the dialog, dealing with the interface itself, and
|
|
... |
|
... |
651 |
|
818 |
|
652 |
* Result list font: There is quite a lot of information shown in the
|
819 |
* Result list font: There is quite a lot of information shown in the
|
653 |
result list, and you may want to customize the font and/or font size.
|
820 |
result list, and you may want to customize the font and/or font size.
|
654 |
The rest of the fonts used by Recoll are determined by your generic QT
|
821 |
The rest of the fonts used by Recoll are determined by your generic QT
|
655 |
config (try the qtconfig command.
|
822 |
config (try the qtconfig command.
|
|
|
823 |
|
|
|
824 |
* Result paragraph format string: allows you to change the presentation
|
|
|
825 |
of each result list entry. This is a qt-html string where the
|
|
|
826 |
following printf-like % substitutions will be performed:
|
|
|
827 |
|
|
|
828 |
* %A. Abstract
|
|
|
829 |
|
|
|
830 |
* %D. Date
|
|
|
831 |
|
|
|
832 |
* %K. Keywords (if any)
|
|
|
833 |
|
|
|
834 |
* %L. Preview and Edit links
|
|
|
835 |
|
|
|
836 |
* %M. Mime type
|
|
|
837 |
|
|
|
838 |
* %N. result Number
|
|
|
839 |
|
|
|
840 |
* %R. Relevance percentage
|
|
|
841 |
|
|
|
842 |
* %S. Size information
|
|
|
843 |
|
|
|
844 |
* %T. Title
|
|
|
845 |
|
|
|
846 |
* %U. Url
|
|
|
847 |
|
|
|
848 |
The default value for the string is:
|
|
|
849 |
|
|
|
850 |
%R %S %L <b>%T</b><br>
|
|
|
851 |
%M %D <i>%U</i><br>
|
|
|
852 |
%A %K
|
|
|
853 |
|
|
|
854 |
|
|
|
855 |
You may, for example, try the following for a more web-like experience
|
|
|
856 |
(but the document title will not act as a link):
|
|
|
857 |
|
|
|
858 |
<u><b><font size=+1 color=#1111cf>%T</font></b></u><br>
|
|
|
859 |
%A<font color=#008000>%U - %S</font> - %L
|
|
|
860 |
|
656 |
|
861 |
|
657 |
* HTML help browser: this will let you chose your preferred browser
|
862 |
* HTML help browser: this will let you chose your preferred browser
|
658 |
which will be started from the Help menu to read the user manual. You
|
863 |
which will be started from the Help menu to read the user manual. You
|
659 |
can enter a simple name if the command is in your PATH, or browse for
|
864 |
can enter a simple name if the command is in your PATH, or browse for
|
660 |
a full pathname.
|
865 |
a full pathname.
|
|
... |
|
... |
748 |
|
953 |
|
749 |
Finally, you may want to have a look at the configuration section.
|
954 |
Finally, you may want to have a look at the configuration section.
|
750 |
|
955 |
|
751 |
----------------------------------------------------------------------
|
956 |
----------------------------------------------------------------------
|
752 |
|
957 |
|
753 |
4.2. Packages needed for external file types
|
958 |
4.2. Supporting packages
|
754 |
|
959 |
|
755 |
Recoll uses external applications to index some file types. You need to
|
960 |
Recoll uses external applications to index some file types. You need to
|
756 |
install them for the file types that you wish to have indexed (these are
|
961 |
install them for the file types that you wish to have indexed (these are
|
757 |
run-time dependencies. None is needed for building Recoll):
|
962 |
run-time dependencies. None is needed for building Recoll):
|
758 |
|
963 |
|
|
... |
|
... |
797 |
|
1002 |
|
798 |
----------------------------------------------------------------------
|
1003 |
----------------------------------------------------------------------
|
799 |
|
1004 |
|
800 |
4.3.2. Building
|
1005 |
4.3.2. Building
|
801 |
|
1006 |
|
802 |
Recoll has been built on Linux (redhat7.3, mandriva 2005, Fedora Core 3),
|
1007 |
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
803 |
FreeBSD and Solaris 8. If you build on another system, I would very much
|
1008 |
3/4/5), FreeBSD and Solaris 8. If you build on another system, I would
|
804 |
welcome patches.
|
1009 |
very much welcome patches.
|
805 |
|
1010 |
|
806 |
Depending on the qt configuration on your system, you may have to set the
|
1011 |
Depending on the qt configuration on your system, you may have to set the
|
807 |
QTDIR and QMAKESPECS variables in your environment:
|
1012 |
QTDIR and QMAKESPECS variables in your environment:
|
808 |
|
1013 |
|
809 |
* QTDIR should point to the directory above the one that holds the qt
|
1014 |
* QTDIR should point to the directory above the one that holds the qt
|
810 |
include files (ie: qt.h).
|
1015 |
include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
|
|
|
1016 |
be /usr/local/qt).
|
811 |
|
1017 |
|
812 |
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
1018 |
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
813 |
sub-directories (ie: linux-g++).
|
1019 |
sub-directories (ie: linux-g++).
|
814 |
|
1020 |
|
815 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
1021 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
816 |
is not needed because there is a default link in mkspecs/.
|
1022 |
is not needed because there is a default link in mkspecs/.
|
817 |
|
1023 |
|
818 |
The Recoll configure script does a better job of checking these variables
|
1024 |
Configure options: --without-aspell will disable the code for phonetic
|
819 |
after release 1.1.1. Before this, unexplained errors will occur during
|
1025 |
matching of search terms. --with-fam or --with-inotify will enable the
|
820 |
compilation if the environment is not set up. Also, for 1.1.0 the qmake
|
1026 |
code for real time indexing. Refer to configure --help output for details.
|
821 |
command should be in your PATH (later releases can also find it in
|
|
|
822 |
$QTDIR/bin).
|
|
|
823 |
|
1027 |
|
824 |
Normal procedure:
|
1028 |
Normal procedure:
|
825 |
|
1029 |
|
826 |
cd recoll-xxx
|
1030 |
cd recoll-xxx
|
827 |
configure
|
1031 |
configure
|
|
... |
|
... |
851 |
You can then proceed to configuration.
|
1055 |
You can then proceed to configuration.
|
852 |
|
1056 |
|
853 |
----------------------------------------------------------------------
|
1057 |
----------------------------------------------------------------------
|
854 |
|
1058 |
|
855 |
4.4. Configuration overview
|
1059 |
4.4. Configuration overview
|
|
|
1060 |
|
|
|
1061 |
Most of the parameters specific to the recoll GUI are set through the
|
|
|
1062 |
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
|
|
1063 |
You probably do not want to edit this by hand.
|
|
|
1064 |
|
|
|
1065 |
For other options, Recoll uses text configuration files. You will have to
|
|
|
1066 |
edit them by hand for now (there is still some hope for a GUI
|
|
|
1067 |
configuration tool in the future). The most accurate documentation for the
|
|
|
1068 |
configuration parameters is given by comments inside the default files,
|
|
|
1069 |
and we will just give a general overview here.
|
856 |
|
1070 |
|
857 |
There are two sets of configuration files. The system-wide files are kept
|
1071 |
There are two sets of configuration files. The system-wide files are kept
|
858 |
in a directory named like /usr/[local/]share/recoll/examples, they define
|
1072 |
in a directory named like /usr/[local/]share/recoll/examples, they define
|
859 |
default values for the system. A parallel set of files exists by default
|
1073 |
default values for the system. A parallel set of files exists by default
|
860 |
in the .recoll directory in your home. This directory can be changed with
|
1074 |
in the .recoll directory in your home. This directory can be changed with
|
|
... |
|
... |
864 |
If the .recoll directory does not exist when recoll or recollindex are
|
1078 |
If the .recoll directory does not exist when recoll or recollindex are
|
865 |
started, it will be created with a set of empty configuration files.
|
1079 |
started, it will be created with a set of empty configuration files.
|
866 |
recoll will give you a chance to edit the configuration file before
|
1080 |
recoll will give you a chance to edit the configuration file before
|
867 |
starting indexing. recollindex will proceed immediately.
|
1081 |
starting indexing. recollindex will proceed immediately.
|
868 |
|
1082 |
|
869 |
Most of the parameters specific to the recoll GUI are set through the
|
|
|
870 |
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
|
|
871 |
You probably do not want to edit this by hand.
|
|
|
872 |
|
|
|
873 |
For other options, Recoll uses text configuration files. You will have to
|
|
|
874 |
edit them by hand for now (there is still some hope for a GUI
|
|
|
875 |
configuration tool in the future). The most accurate documentation for the
|
|
|
876 |
configuration parameters is given by comments inside the default files,
|
|
|
877 |
and we will just give a general overview here.
|
|
|
878 |
|
|
|
879 |
All configuration files share the same format. For example, a short
|
1083 |
All configuration files share the same format. For example, a short
|
880 |
extract of the main configuration file might look as follows:
|
1084 |
extract of the main configuration file might look as follows:
|
881 |
|
1085 |
|
882 |
# Space-separated list of directories to index.
|
1086 |
# Space-separated list of directories to index.
|
883 |
topdirs = ~/docs /usr/share/doc
|
1087 |
topdirs = ~/docs /usr/share/doc
|
|
... |
|
... |
892 |
|
1096 |
|
893 |
* Parameter affectation (name = value).
|
1097 |
* Parameter affectation (name = value).
|
894 |
|
1098 |
|
895 |
* Section definition ([somedirname]).
|
1099 |
* Section definition ([somedirname]).
|
896 |
|
1100 |
|
897 |
Section lines allow redefining some parameters for a directory sub-tree.
|
1101 |
Section definitions allow redefining some parameters for a directory
|
898 |
Some of the parameters used for indexing are looked up hierarchically from
|
1102 |
sub-tree. They stay in effect until another section definition, or the end
|
899 |
the more to the less specific. Not all parameters can be meaningfully
|
1103 |
of file, is encountered. Some of the parameters used for indexing are
|
900 |
redefined, this is specified for each in the next section.
|
1104 |
looked up hierarchically from the current directory location upwards. Not
|
|
|
1105 |
all parameters can be meaningfully redefined, this is specified for each
|
|
|
1106 |
in the next section.
|
901 |
|
1107 |
|
902 |
The tilde character (~) is expanded in file names to the name of the
|
1108 |
The tilde character (~) is expanded in file names to the name of the
|
903 |
user's home directory.
|
1109 |
user's home directory.
|
904 |
|
1110 |
|
905 |
White space is used for separation inside lists. Elements with embedded
|
1111 |
White space is used for separation inside lists. Elements with embedded
|
|
... |
|
... |
954 |
mail user agents like thunderbird usually store messages in hidden
|
1160 |
mail user agents like thunderbird usually store messages in hidden
|
955 |
directories, and you probably want this indexed. One possible
|
1161 |
directories, and you probably want this indexed. One possible
|
956 |
solution is to have .* in skippedNames, and add things like
|
1162 |
solution is to have .* in skippedNames, and add things like
|
957 |
~/.thunderbird or ~/.evolution in topdirs.
|
1163 |
~/.thunderbird or ~/.evolution in topdirs.
|
958 |
|
1164 |
|
959 |
loglevel
|
1165 |
loglevel,daemloglevel
|
960 |
|
1166 |
|
961 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
1167 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
962 |
quite a lot of debug/information messages. 2 only lists errors.
|
1168 |
quite a lot of debug/information messages. 2 only lists errors.
|
|
|
1169 |
The daemversion is specific to the indexing monitor daemon.
|
963 |
|
1170 |
|
964 |
logfilename
|
1171 |
logfilename, daemlogfilename
|
965 |
|
1172 |
|
966 |
Where the messages should go. 'stderr' can be used as a special
|
1173 |
Where the messages should go. 'stderr' can be used as a special
|
967 |
value, and is the default.
|
1174 |
value, and is the default. The daemversion is specific to the
|
|
|
1175 |
indexing monitor daemon.
|
968 |
|
1176 |
|
969 |
filtersdir
|
1177 |
filtersdir
|
970 |
|
1178 |
|
971 |
A directory to search for the external filter scripts used to
|
1179 |
A directory to search for the external filter scripts used to
|
972 |
index some types of files. The value should not be changed, except
|
1180 |
index some types of files. The value should not be changed, except
|