|
a/src/README |
|
b/src/README |
|
... |
|
... |
38 |
2.4. Using cron to automate indexation
|
38 |
2.4. Using cron to automate indexation
|
39 |
|
39 |
|
40 |
3. Search
|
40 |
3. Search
|
41 |
|
41 |
|
42 |
3.1. Simple search
|
42 |
3.1. Simple search
|
|
|
43 |
|
|
|
44 |
3.1.1. Filename search
|
43 |
|
45 |
|
44 |
3.2. Complex/advanced search
|
46 |
3.2. Complex/advanced search
|
45 |
|
47 |
|
46 |
3.3. Document history
|
48 |
3.3. Document history
|
47 |
|
49 |
|
|
... |
|
... |
108 |
|
110 |
|
109 |
You do not need to remember in what file or email message you stored a
|
111 |
You do not need to remember in what file or email message you stored a
|
110 |
given piece of information. You just ask for related terms, and the tool
|
112 |
given piece of information. You just ask for related terms, and the tool
|
111 |
will return a list of documents where those terms are prominent.
|
113 |
will return a list of documents where those terms are prominent.
|
112 |
|
114 |
|
113 |
This mode of operation has been made very familiar by www search engines.
|
115 |
This mode of operation has been made very familiar by internet search
|
|
|
116 |
engines.
|
114 |
|
117 |
|
115 |
The notion of relevance is a difficult one, as only you, the user,
|
118 |
The notion of relevance is a difficult one, as only you, the user,
|
116 |
actually know which documents are relevant to your search, and the
|
119 |
actually know which documents are relevant to your search, and the
|
117 |
application can only try a guess. The quality of this guess is probably
|
120 |
application can only try a guess. The quality of this guess is probably
|
118 |
the most important element for a search application.
|
121 |
the most important element for a search application.
|
|
... |
|
... |
153 |
Stemming depends on the document language. Recoll stores the unstemmed
|
156 |
Stemming depends on the document language. Recoll stores the unstemmed
|
154 |
versions of terms and uses auxiliary databases for term expansion. It can
|
157 |
versions of terms and uses auxiliary databases for term expansion. It can
|
155 |
switch stemming languages, or add a language, without reindexing. Storing
|
158 |
switch stemming languages, or add a language, without reindexing. Storing
|
156 |
documents in different languages in the same database is possible, and
|
159 |
documents in different languages in the same database is possible, and
|
157 |
useful in practice, but does introduce possibilities of confusion. Recoll
|
160 |
useful in practice, but does introduce possibilities of confusion. Recoll
|
158 |
makes no attempt at automatic language recognition.
|
161 |
currently makes no attempt at automatic language recognition.
|
159 |
|
162 |
|
160 |
Recoll has many parameters which define exactly what to index, and how to
|
163 |
Recoll has many parameters which define exactly what to index, and how to
|
161 |
classify and decode the source documents. These are kept in a
|
164 |
classify and decode the source documents. These are kept in a
|
162 |
configuration file. A sample configuration is installed into the .recoll
|
165 |
configuration file. A default configuration is copied into a standard
|
163 |
subdirectory of your home directory when you first execute a Recoll
|
166 |
location (usually something like /usr/[local/]share/recoll/examples)
|
164 |
command. The initial configuration will index your home directory with
|
167 |
during installation. The default parameters from this file may be
|
165 |
default parameters and should be sufficient for giving Recoll a try, but
|
168 |
overriden by values that you set inside your personal configuration, found
|
166 |
you may want to adjust it later.
|
169 |
by default in the .recoll subdirectory of your home directory. The default
|
|
|
170 |
configuration will index your home directory with default parameters and
|
|
|
171 |
should be sufficient for giving Recoll a try, but you may want to adjust
|
|
|
172 |
it later.
|
167 |
|
173 |
|
168 |
Indexation is started automatically the first time you execute the recoll
|
174 |
Indexation is started automatically the first time you execute the recoll
|
169 |
search graphical user interface, or by executing the recollindex command.
|
175 |
search graphical user interface, or by executing the recollindex command.
|
170 |
|
176 |
|
171 |
Searches are performed inside the recoll program, which has many options
|
177 |
Searches are performed inside the recoll program, which has many options
|
|
... |
|
... |
214 |
|
220 |
|
215 |
----------------------------------------------------------------------
|
221 |
----------------------------------------------------------------------
|
216 |
|
222 |
|
217 |
2.2. The indexation configuration
|
223 |
2.2. The indexation configuration
|
218 |
|
224 |
|
219 |
The main configuration file is named $HOME/.recoll/recoll.conf by default
|
225 |
Values set in the system-wide configuration file (named like
|
|
|
226 |
/usr/[local/]share/recoll/examples/recoll.conf) can be overriden by those
|
|
|
227 |
set in the personal one, named $HOME/.recoll/recoll.conf by default or
|
220 |
or $RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set.
|
228 |
$RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set.
|
221 |
|
229 |
|
222 |
The most accurate documentation for editing the file is given by comments
|
230 |
The most accurate documentation for editing the file is given by comments
|
223 |
inside the default file that will be created when you first start recoll.
|
231 |
inside the central one. If you want to adjust the configuration before
|
224 |
If you want to adjust the configuration before indexation, just click
|
232 |
indexation, just click Cancel when the program asks if it should start
|
225 |
Cancel when the program asks if it should start initial indexation.
|
233 |
initial indexation. This will have created a .recoll directory containing
|
|
|
234 |
empty configuration files.
|
226 |
|
235 |
|
227 |
The configuration is also documented inside the installation chapter of
|
236 |
The configuration is also documented inside the installation chapter of
|
228 |
this document, or in the recoll.conf(5) man page.
|
237 |
this document, or in the recoll.conf(5) man page.
|
229 |
|
238 |
|
230 |
----------------------------------------------------------------------
|
239 |
----------------------------------------------------------------------
|
|
... |
|
... |
281 |
checkbox to ensure that only documents with all the terms will be
|
290 |
checkbox to ensure that only documents with all the terms will be
|
282 |
returned. Use the Tools / Advanced search dialog for more complex
|
291 |
returned. Use the Tools / Advanced search dialog for more complex
|
283 |
searches.
|
292 |
searches.
|
284 |
|
293 |
|
285 |
After starting a search, a list of results will instantly be displayed in
|
294 |
After starting a search, a list of results will instantly be displayed in
|
286 |
the main list window. Clicking on an entry will open an internal preview
|
295 |
the main list window. Clicking on the Preview link for an entry will open
|
287 |
window for the document. Double-clicking will attempt to start an external
|
296 |
an internal preview window for the document. Clicking the Edit link will
|
288 |
viewer (have a look at the ~/.recoll/mimeconf file to see how these are
|
297 |
attempt to start an external viewer (have a look at the mimeconf
|
289 |
configured).
|
298 |
configuration file to see how these are configured).
|
290 |
|
299 |
|
291 |
By default, the document list is presented in order of relevance (how well
|
300 |
By default, the document list is presented in order of relevance (how well
|
292 |
the system estimates that the document matches the query). You can specify
|
301 |
the system estimates that the document matches the query). You can specify
|
293 |
a different ordering by using the Tools / Sort parameters dialog.
|
302 |
a different ordering by using the Tools / Sort parameters dialog.
|
294 |
|
303 |
|
295 |
You can click on the first paragraph (Query results or No results found)
|
304 |
You can click on the Query details link at the top of the results page to
|
296 |
in the result list to get an exact display of the query actually
|
305 |
see the query actually performed, after stem expansion and other
|
297 |
performed, after stem expansion and other processing.
|
306 |
processing.
|
|
|
307 |
|
|
|
308 |
----------------------------------------------------------------------
|
|
|
309 |
|
|
|
310 |
3.1.1. Filename search
|
|
|
311 |
|
|
|
312 |
If the File name checkbox at the left of the search terms is checked, the
|
|
|
313 |
search will only done for file names. In this case you can use the usual
|
|
|
314 |
shell wildcard characters * and ? for expanding the search (ie
|
|
|
315 |
*somestring*).
|
298 |
|
316 |
|
299 |
----------------------------------------------------------------------
|
317 |
----------------------------------------------------------------------
|
300 |
|
318 |
|
301 |
3.2. Complex/advanced search
|
319 |
3.2. Complex/advanced search
|
302 |
|
320 |
|
303 |
The advanced search dialog has fields that will allow a more refined
|
321 |
The advanced search dialog has fields that will allow a more refined
|
304 |
search, looking for documents with all given words, a given exact phrase,
|
322 |
search, looking for documents with all given words, a given exact phrase,
|
305 |
or none of the given words (all relevant fields will be combined by an
|
323 |
none of the given words, or a given file name (with wildcard expansion).
|
306 |
implicit AND clause).
|
324 |
All relevant fields will be combined by an implicit AND clause.
|
307 |
|
325 |
|
308 |
It will let you search for documents of specific mime types (ie: only
|
326 |
It will let you search for documents of specific mime types (ie: only
|
309 |
text/plain, or text/html or application/pdf etc...)
|
327 |
text/plain, or text/html or application/pdf etc...)
|
310 |
|
328 |
|
311 |
It will let you restrict the search results to a subtree of the indexed
|
329 |
It will let you restrict the search results to a subtree of the indexed
|
312 |
area.
|
330 |
area.
|
313 |
|
331 |
|
314 |
Click on the Start Search button in the advanced search dialog to start
|
332 |
Click on the Start Search button in the advanced search dialog to start
|
315 |
the search. The button in the main window always performs a simple search.
|
333 |
the search. The button in the main window always performs a simple search.
|
316 |
|
334 |
|
317 |
Click on the result list header paragraph to see the query expansion.
|
335 |
Click on the Show query details link at the top of the result page to see
|
|
|
336 |
the query expansion.
|
318 |
|
337 |
|
319 |
----------------------------------------------------------------------
|
338 |
----------------------------------------------------------------------
|
320 |
|
339 |
|
321 |
3.3. Document history
|
340 |
3.3. Document history
|
322 |
|
341 |
|
|
... |
|
... |
335 |
The tool sorts a specified number of the most relevant documents in the
|
354 |
The tool sorts a specified number of the most relevant documents in the
|
336 |
result list, according to specified criteria. The currently available
|
355 |
result list, according to specified criteria. The currently available
|
337 |
criteria are date and mime type.
|
356 |
criteria are date and mime type.
|
338 |
|
357 |
|
339 |
The sort parameters stay in effect until they are explicitely reset, or
|
358 |
The sort parameters stay in effect until they are explicitely reset, or
|
340 |
the program exits.
|
359 |
the program exits. An activated sort is indicated in the result list
|
|
|
360 |
header.
|
341 |
|
361 |
|
342 |
----------------------------------------------------------------------
|
362 |
----------------------------------------------------------------------
|
343 |
|
363 |
|
344 |
3.5. Search tips, shortcuts
|
364 |
3.5. Search tips, shortcuts
|
345 |
|
365 |
|
|
... |
|
... |
356 |
Query explanation. You can get an exact description of what the query
|
376 |
Query explanation. You can get an exact description of what the query
|
357 |
looked for, including stem expansion, and boolean operators used, by
|
377 |
looked for, including stem expansion, and boolean operators used, by
|
358 |
clicking on the result list header.
|
378 |
clicking on the result list header.
|
359 |
|
379 |
|
360 |
File names. All file name elements (the broken up file path) are entered
|
380 |
File names. All file name elements (the broken up file path) are entered
|
361 |
as terms during indexation, and you can specify them when searching.
|
381 |
as terms during indexation, and you can specify them as ordinary terms in
|
|
|
382 |
normal search fields. Alternatively, you can use specific file name search
|
|
|
383 |
which will only look for file names and can use wildcard expansion.
|
362 |
|
384 |
|
363 |
Quitting. Entering ^Q almost anywhere will close the application.
|
385 |
Quitting. Entering ^Q almost anywhere will close the application.
|
364 |
|
386 |
|
365 |
Closing previews. Entering ^W in a preview tab will close it (and, for the
|
387 |
Closing previews. Entering ^W in a preview tab will close it (and, for the
|
366 |
last tab, close the preview window).
|
388 |
last tab, close the preview window).
|
|
... |
|
... |
436 |
|
458 |
|
437 |
External file types. Recoll uses external applications to index some file
|
459 |
External file types. Recoll uses external applications to index some file
|
438 |
types. You need to install them for the file types that you wish to have
|
460 |
types. You need to install them for the file types that you wish to have
|
439 |
indexed:
|
461 |
indexed:
|
440 |
|
462 |
|
|
|
463 |
* PDF: pdftotext is part of the Xpdf package.
|
|
|
464 |
|
|
|
465 |
* Postscript: pstotext.
|
|
|
466 |
|
441 |
* MS Word: antiword.
|
467 |
* MS Word: antiword.
|
442 |
|
468 |
|
443 |
* PDF: pdftotext is part of the Xpdf package.
|
|
|
444 |
|
|
|
445 |
* Postscript: pstotext.
|
|
|
446 |
|
|
|
447 |
* RTF: unrtf
|
469 |
* RTF: unrtf
|
|
|
470 |
|
|
|
471 |
* dvi: dvips
|
|
|
472 |
|
|
|
473 |
* djvu: DjVuLibre
|
|
|
474 |
|
|
|
475 |
Text, Html, mail folders and Openoffice files are processed internally.
|
448 |
|
476 |
|
449 |
----------------------------------------------------------------------
|
477 |
----------------------------------------------------------------------
|
450 |
|
478 |
|
451 |
4.1.2. Building
|
479 |
4.1.2. Building
|
452 |
|
480 |
|
|
... |
|
... |
523 |
|
551 |
|
524 |
----------------------------------------------------------------------
|
552 |
----------------------------------------------------------------------
|
525 |
|
553 |
|
526 |
4.3. Configuration overview
|
554 |
4.3. Configuration overview
|
527 |
|
555 |
|
528 |
The personal configuration files and the database are normally kept in the
|
556 |
There are two sets of configuration files. The system-wide files are kept
|
|
|
557 |
in a directory named like /usr/[local/]share/recoll/examples, they define
|
|
|
558 |
default values for the system. A parallel set of files exists in the
|
529 |
.recoll directory in your home (this can be changed with the
|
559 |
.recoll directory in your home (this can be changed with the
|
530 |
RECOLL_CONFDIR environment variable, and a parameter inside the main
|
560 |
RECOLL_CONFDIR environment variable. The database is also kept in .recoll
|
531 |
configuration file). If this directory does not exist when recoll or
|
561 |
by default, (this can be changed by a configuration parameter).
|
532 |
recollindex are started, the directory will be created and the sample
|
562 |
|
533 |
configuration files will be copied. recoll will give you a chance to edit
|
563 |
If the .recoll directory does not exist when recoll or recollindex are
|
534 |
the configuration file before starting indexation. recollindex will
|
564 |
started, it will be created with a set of empty configuration files.
|
535 |
proceed immediately.
|
565 |
recoll will give you a chance to edit the configuration file before
|
|
|
566 |
starting indexation. recollindex will proceed immediately.
|
536 |
|
567 |
|
537 |
Most of the parameters specific to the recoll GUI are set through the
|
568 |
Most of the parameters specific to the recoll GUI are set through the
|
538 |
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
569 |
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
539 |
You probably do not want to edit this by hand.
|
570 |
You probably do not want to edit this by hand.
|
540 |
|
571 |
|
541 |
For other options, Recoll uses text configuration files. You will have to
|
572 |
For other options, Recoll uses text configuration files. You will have to
|
542 |
edit them by hand for now (there is still some hope for a GUI
|
573 |
edit them by hand for now (there is still some hope for a GUI
|
543 |
configuration tool in the future). The most accurate documentation for the
|
574 |
configuration tool in the future). The most accurate documentation for the
|
544 |
configuration parameters is given by comments inside the sample files, and
|
575 |
configuration parameters is given by comments inside the default files,
|
545 |
we will just give a general overview here.
|
576 |
and we will just give a general overview here.
|
546 |
|
577 |
|
547 |
All configuration files share the same format. For exemple, a short
|
578 |
All configuration files share the same format. For exemple, a short
|
548 |
extract of the main configuration file might look as follows:
|
579 |
extract of the main configuration file might look as follows:
|
549 |
|
580 |
|
550 |
# Space-separated list of directories to index.
|
581 |
# Space-separated list of directories to index.
|
|
... |
|
... |
575 |
|
606 |
|
576 |
----------------------------------------------------------------------
|
607 |
----------------------------------------------------------------------
|
577 |
|
608 |
|
578 |
4.3.1. Main configuration file
|
609 |
4.3.1. Main configuration file
|
579 |
|
610 |
|
580 |
~/.recoll/recoll.conf is the main configuration file. It defines things
|
611 |
recoll.conf is the main configuration file. It defines things like what to
|
581 |
like what to index (top directories and things to ignore), and the default
|
612 |
index (top directories and things to ignore), and the default character
|
582 |
character set to use for document types which do not specify it
|
613 |
set to use for document types which do not specify it internally.
|
583 |
internally.
|
|
|
584 |
|
614 |
|
585 |
The default configuration will index your home directory. If this is not
|
615 |
The default configuration will index your home directory. If this is not
|
586 |
appropriate, use recoll to copy the sample configuration, click Cancel,
|
616 |
appropriate, use recoll to copy the sample configuration, click Cancel,
|
587 |
and edit the configuration file before restarting the command. This will
|
617 |
and edit the configuration file before restarting the command. This will
|
588 |
start the initial indexation, which may take some time.
|
618 |
start the initial indexation, which may take some time.
|
|
... |
|
... |
668 |
determining the mime type for a file (the main procedure uses
|
698 |
determining the mime type for a file (the main procedure uses
|
669 |
suffix associations as defined in the mimemap file). This can be
|
699 |
suffix associations as defined in the mimemap file). This can be
|
670 |
useful for files with suffixless names, but it will also cause the
|
700 |
useful for files with suffixless names, but it will also cause the
|
671 |
indexation of many bogus "text" files.
|
701 |
indexation of many bogus "text" files.
|
672 |
|
702 |
|
|
|
703 |
indexallfilenames
|
|
|
704 |
|
|
|
705 |
Recoll indexes file names in a special section of the database to
|
|
|
706 |
allow specific file names searches using wild cards. This
|
|
|
707 |
parameter decides if file name indexing is performed only for
|
|
|
708 |
files with mime types that would qualify them for full text
|
|
|
709 |
indexation, or for all files inside the selected subtrees,
|
|
|
710 |
independant of mime type.
|
|
|
711 |
|
673 |
----------------------------------------------------------------------
|
712 |
----------------------------------------------------------------------
|
674 |
|
713 |
|
675 |
4.3.2. The mimemap file
|
714 |
4.3.2. The mimemap file
|
676 |
|
715 |
|
677 |
~/.recoll/mimemap specifies the file name extension to mime type mappings.
|
716 |
mimemap specifies the file name extension to mime type mappings.
|
678 |
|
717 |
|
679 |
For file names without an extension, or with an unknown one, the system's
|
718 |
For file names without an extension, or with an unknown one, the system's
|
680 |
file -i command will be executed to determine the mime type (this can be
|
719 |
file -i command will be executed to determine the mime type (this can be
|
681 |
switched off inside the main configuration file).
|
720 |
switched off inside the main configuration file).
|
682 |
|
721 |
|
|
... |
|
... |
697 |
|
736 |
|
698 |
----------------------------------------------------------------------
|
737 |
----------------------------------------------------------------------
|
699 |
|
738 |
|
700 |
4.3.3. The mimeconf file
|
739 |
4.3.3. The mimeconf file
|
701 |
|
740 |
|
702 |
~/.recoll/mimeconf specifies how the different mime types are handled for
|
741 |
mimeconf specifies how the different mime types are handled for
|
703 |
indexation, and for display.
|
742 |
indexation, and for display.
|
704 |
|
743 |
|
705 |
Changing the indexation parameters is probably not a good idea except if
|
744 |
Changing the indexation parameters is probably not a good idea except if
|
706 |
you are a Recoll developper.
|
745 |
you are a Recoll developper.
|
707 |
|
746 |
|
708 |
You may want to adjust the external viewers defined in (ie: html is either
|
747 |
You may want to adjust the external viewers defined in (ie: html is either
|
709 |
previewed internally or displayed using firefox, but you may prefer
|
748 |
previewed internally or displayed using firefox, but you may prefer
|
|
|
749 |
mozilla, your openoffice.org program might be named oofice instead of
|
710 |
mozilla...). Look for the [view] section.
|
750 |
openoffice ...). Look for the [view] section.
|
711 |
|
751 |
|
712 |
You can also change the icons which are displayed by recoll in the result
|
752 |
You can also change the icons which are displayed by recoll in the result
|
713 |
lists (the values are the basenames of the png images inside the iconsdir
|
753 |
lists (the values are the basenames of the png images inside the iconsdir
|
714 |
directory (specified in recoll.conf).
|
754 |
directory (specified in recoll.conf).
|
715 |
|
755 |
|