|
a/src/README |
|
b/src/README |
|
... |
|
... |
25 |
|
25 |
|
26 |
1.2. Full text search
|
26 |
1.2. Full text search
|
27 |
|
27 |
|
28 |
1.3. Recoll overview
|
28 |
1.3. Recoll overview
|
29 |
|
29 |
|
30 |
2. Indexation
|
30 |
2. Indexing
|
31 |
|
31 |
|
32 |
2.1. Introduction
|
32 |
2.1. Introduction
|
33 |
|
33 |
|
|
|
34 |
2.2. Index storage
|
|
|
35 |
|
|
|
36 |
2.2.1. Security aspects
|
|
|
37 |
|
34 |
2.2. The indexation configuration
|
38 |
2.3. The indexing configuration
|
35 |
|
39 |
|
36 |
2.3. Starting indexation
|
40 |
2.4. Starting indexing
|
37 |
|
41 |
|
38 |
2.4. Using cron to automate indexation
|
42 |
2.5. Using cron to automate indexing
|
39 |
|
43 |
|
40 |
3. Search
|
44 |
3. Search
|
41 |
|
45 |
|
42 |
3.1. Simple search
|
46 |
3.1. Simple search
|
43 |
|
47 |
|
44 |
3.2. Complex/advanced search
|
48 |
3.2. Complex/advanced search
|
45 |
|
49 |
|
|
|
50 |
3.3. Multiple databases
|
|
|
51 |
|
46 |
3.3. Document history
|
52 |
3.4. Document history
|
47 |
|
53 |
|
48 |
3.4. Result list sorting
|
54 |
3.5. Result list sorting
|
49 |
|
55 |
|
|
|
56 |
3.6. Additional result list functionality
|
|
|
57 |
|
50 |
3.5. Search tips, shortcuts
|
58 |
3.7. Search tips, shortcuts
|
51 |
|
59 |
|
52 |
3.6. Customising the search interface
|
60 |
3.8. Customising the search interface
|
53 |
|
61 |
|
54 |
4. Installation
|
62 |
4. Installation
|
55 |
|
63 |
|
56 |
4.1. Building from source
|
64 |
4.1. Building from source
|
57 |
|
65 |
|
|
... |
|
... |
134 |
1.3. Recoll overview
|
142 |
1.3. Recoll overview
|
135 |
|
143 |
|
136 |
Recoll uses the Xapian information retrieval library as its storage and
|
144 |
Recoll uses the Xapian information retrieval library as its storage and
|
137 |
retrieval engine. Xapian is a very mature package using a sophisticated
|
145 |
retrieval engine. Xapian is a very mature package using a sophisticated
|
138 |
probabilistic ranking model. Recoll provides the interface to get data
|
146 |
probabilistic ranking model. Recoll provides the interface to get data
|
139 |
into (indexation) and out (searching) of the system.
|
147 |
into (indexing) and out (searching) of the system.
|
140 |
|
148 |
|
141 |
In practice, Xapian works by remembering where terms appear in your
|
149 |
In practice, Xapian works by remembering where terms appear in your
|
142 |
document files. The acquisition process is called indexation.
|
150 |
document files. The acquisition process is called indexing.
|
143 |
|
151 |
|
144 |
The resulting database can be big (roughly the size of the original
|
152 |
The resulting index can be big (roughly the size of the original document
|
145 |
document set), but it is not a document archive. Recoll can only display
|
153 |
set), but it is not a document archive. Recoll can only display documents
|
146 |
documents that still exist at the place from which they were indexed.
|
154 |
that still exist at the place from which they were indexed. (Actually,
|
147 |
(Actually, there is a way to reconstruct a document from the information
|
155 |
there is a way to reconstruct a document from the information in the
|
148 |
in the database, but the result is not nice, as all formatting,
|
156 |
index, but the result is not nice, as all formatting, punctuation and
|
149 |
punctuation and capitalisation are lost).
|
157 |
capitalisation are lost).
|
150 |
|
158 |
|
151 |
Recoll stores all internal data in Unicode UTF-8 format, and it can index
|
159 |
Recoll stores all internal data in Unicode UTF-8 format, and it can index
|
152 |
files with different character sets, encodings, and languages into the
|
160 |
files with different character sets, encodings, and languages into the
|
153 |
same database. It has input filters for many document types.
|
161 |
same index. It has input filters for many document types.
|
154 |
|
162 |
|
155 |
Stemming depends on the document language. Recoll stores the unstemmed
|
163 |
Stemming depends on the document language. Recoll stores the unstemmed
|
156 |
versions of terms and uses auxiliary databases for term expansion. It can
|
164 |
versions of terms and uses auxiliary databases for term expansion. It can
|
157 |
switch stemming languages, or add a language, without reindexing. Storing
|
165 |
switch stemming languages, or add a language, without reindexing. Storing
|
158 |
documents in different languages in the same database is possible, and
|
166 |
documents in different languages in the same index is possible, and useful
|
159 |
useful in practice, but does introduce possibilities of confusion. Recoll
|
167 |
in practice, but does introduce possibilities of confusion. Recoll
|
160 |
currently makes no attempt at automatic language recognition.
|
168 |
currently makes no attempt at automatic language recognition.
|
161 |
|
169 |
|
162 |
Recoll has many parameters which define exactly what to index, and how to
|
170 |
Recoll has many parameters which define exactly what to index, and how to
|
163 |
classify and decode the source documents. These are kept in a
|
171 |
classify and decode the source documents. These are kept in a
|
164 |
configuration file. A default configuration is copied into a standard
|
172 |
configuration file. A default configuration is copied into a standard
|
|
... |
|
... |
168 |
by default in the .recoll subdirectory of your home directory. The default
|
176 |
by default in the .recoll subdirectory of your home directory. The default
|
169 |
configuration will index your home directory with default parameters and
|
177 |
configuration will index your home directory with default parameters and
|
170 |
should be sufficient for giving Recoll a try, but you may want to adjust
|
178 |
should be sufficient for giving Recoll a try, but you may want to adjust
|
171 |
it later.
|
179 |
it later.
|
172 |
|
180 |
|
173 |
Indexation is started automatically the first time you execute the recoll
|
181 |
Indexing is started automatically the first time you execute the recoll
|
174 |
search graphical user interface, or by executing the recollindex command.
|
182 |
search graphical user interface, or by executing the recollindex command.
|
175 |
|
183 |
|
176 |
Searches are performed inside the recoll program, which has many options
|
184 |
Searches are performed inside the recoll program, which has many options
|
177 |
to help you find what you are looking for.
|
185 |
to help you find what you are looking for.
|
178 |
|
186 |
|
179 |
----------------------------------------------------------------------
|
187 |
----------------------------------------------------------------------
|
180 |
|
188 |
|
181 |
Chapter 2. Indexation
|
189 |
Chapter 2. Indexing
|
182 |
|
190 |
|
183 |
2.1. Introduction
|
191 |
2.1. Introduction
|
184 |
|
192 |
|
185 |
Indexation is the process by which the set of documents is analyzed and
|
193 |
Indexing is the process by which the set of documents is analyzed and the
|
186 |
the data entered into the database. Recoll indexation is normally
|
194 |
data entered into the database. Recoll indexing is normally incremental:
|
187 |
incremental: documents will only be processed if they have been modified.
|
195 |
documents will only be processed if they have been modified. On the first
|
188 |
On the first execution, of course, all documents will need processing. A
|
196 |
execution, of course, all documents will need processing. A full index
|
189 |
full index build can be forced later on by specifying an option to the
|
197 |
build can be forced later on by specifying an option to the indexing
|
190 |
indexation command (recollindex -z).
|
198 |
command (recollindex -z).
|
191 |
|
199 |
|
192 |
Recoll indexation takes place at discrete times. There is currently no
|
200 |
Recoll indexing takes place at discrete times. There is currently no
|
193 |
interface to real time file modification monitors. The typical usage is to
|
201 |
interface to real time file modification monitors. The typical usage is to
|
194 |
have a nightly indexation run programmed into your cron file.
|
202 |
have a nightly indexing run programmed into your cron file.
|
195 |
|
203 |
|
196 |
+------------------------------------------------------------------------+
|
204 |
+------------------------------------------------------------------------+
|
197 |
| Side note: there is nothing in Recoll and Xapian that would prevent |
|
205 |
| Side note: there is nothing in Recoll and Xapian that would prevent |
|
198 |
| interfacing with a real time file modification monitor, but this would |
|
206 |
| interfacing with a real time file modification monitor, but this would |
|
199 |
| tend to consume significant system resources for dubious gain, because |
|
207 |
| tend to consume significant system resources for dubious gain, because |
|
|
... |
|
... |
206 |
for document types recognition and processing are set in configuration
|
214 |
for document types recognition and processing are set in configuration
|
207 |
files Most file types, like HTML or word processing files, only hold one
|
215 |
files Most file types, like HTML or word processing files, only hold one
|
208 |
document. Some file types, like mail folder files can hold many
|
216 |
document. Some file types, like mail folder files can hold many
|
209 |
individually indexed documents.
|
217 |
individually indexed documents.
|
210 |
|
218 |
|
211 |
Recoll indexation processes plain text, HTML, openoffice and e-mail files
|
219 |
Recoll indexing processes plain text, HTML, openoffice and e-mail files
|
212 |
internally. Other types (ie: postscript, pdf, ms-word, rtf) need external
|
220 |
internally. Other types (ie: postscript, pdf, ms-word, rtf) need external
|
213 |
applications for preprocessing. The list is in the installation section.
|
221 |
applications for preprocessing. The list is in the installation section.
|
214 |
|
222 |
|
215 |
Without further configuration, Recoll will index all appropriate files
|
223 |
Without further configuration, Recoll will index all appropriate files
|
216 |
from your home directory, with a reasonable set of defaults.
|
224 |
from your home directory, with a reasonable set of defaults.
|
217 |
|
225 |
|
218 |
----------------------------------------------------------------------
|
226 |
----------------------------------------------------------------------
|
219 |
|
227 |
|
|
|
228 |
2.2. Index storage
|
|
|
229 |
|
|
|
230 |
The default location for the index data is the $HOME/.recoll/xapiandb/
|
|
|
231 |
directory. This can be changed by setting the RECOLL_CONFDIR environment
|
|
|
232 |
variable, or by specifying the dbdir parameter in the configuration file
|
|
|
233 |
(see the configuration section).
|
|
|
234 |
|
|
|
235 |
The size of the index is determined by the size of the set of documents,
|
|
|
236 |
but the ratio can vary a lot. For a typical mixed set of documents, the
|
|
|
237 |
index size will often be close to the data set size. In specific cases (a
|
|
|
238 |
set of compressed mbox files for example), the index can become much
|
|
|
239 |
bigger than the documents. It may also be much smaller if the documents
|
|
|
240 |
contain a lot of images or other non-indexed data (an extreme example
|
|
|
241 |
being a set of mp3 files where only the tags would be indexed).
|
|
|
242 |
|
|
|
243 |
Of course, images, sound and video do not increase the index size, which
|
|
|
244 |
means that it will be quite typical nowadays (2006), that even a big index
|
|
|
245 |
will be negligible against the total amount of data on the computer.
|
|
|
246 |
|
|
|
247 |
The index data directory only contains data that will be rebuilt by an
|
|
|
248 |
index run, so that it can be destroyed safely.
|
|
|
249 |
|
|
|
250 |
----------------------------------------------------------------------
|
|
|
251 |
|
|
|
252 |
2.2.1. Security aspects
|
|
|
253 |
|
|
|
254 |
The Recoll index does not hold copies of the indexed documents. But it
|
|
|
255 |
does hold enough data to allow for an almost complete reconstruction. If
|
|
|
256 |
confidential data is indexed, access to the database directory should be
|
|
|
257 |
restricted.
|
|
|
258 |
|
|
|
259 |
As of version 1.4, Recoll will create the configuration directory with a
|
|
|
260 |
mode of 0700 (access by owner only). As the index directory is by default
|
|
|
261 |
a subdirectory of the configuration directory, this should result in
|
|
|
262 |
appropriate protection.
|
|
|
263 |
|
|
|
264 |
If you use another setup, you should think of the kind of protection you
|
|
|
265 |
need for your index, and set the directory access modes appropriately.
|
|
|
266 |
|
|
|
267 |
----------------------------------------------------------------------
|
|
|
268 |
|
220 |
2.2. The indexation configuration
|
269 |
2.3. The indexing configuration
|
221 |
|
270 |
|
222 |
Values set in the system-wide configuration file (named like
|
271 |
Values set in the system-wide configuration file (named like
|
223 |
/usr/[local/]share/recoll/examples/recoll.conf) can be overriden by those
|
272 |
/usr/[local/]share/recoll/examples/recoll.conf) can be overriden by those
|
224 |
set in the personal one, named $HOME/.recoll/recoll.conf by default or
|
273 |
set in the personal one, named $HOME/.recoll/recoll.conf by default or
|
225 |
$RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set.
|
274 |
$RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set.
|
226 |
|
275 |
|
227 |
The most accurate documentation for editing the file is given by comments
|
276 |
The most accurate documentation for editing the file is given by comments
|
228 |
inside the central one. If you want to adjust the configuration before
|
277 |
inside the central one. If you want to adjust the configuration before
|
229 |
indexation, just click Cancel when the program asks if it should start
|
278 |
indexing, just click Cancel when the program asks if it should start
|
230 |
initial indexation. This will have created a .recoll directory containing
|
279 |
initial indexing. This will have created a .recoll directory containing
|
231 |
empty configuration files.
|
280 |
empty configuration files.
|
232 |
|
281 |
|
233 |
The configuration is also documented inside the installation chapter of
|
282 |
The configuration is also documented inside the installation chapter of
|
234 |
this document, or in the recoll.conf(5) man page.
|
283 |
this document, or in the recoll.conf(5) man page.
|
235 |
|
284 |
|
236 |
----------------------------------------------------------------------
|
285 |
----------------------------------------------------------------------
|
237 |
|
286 |
|
238 |
2.3. Starting indexation
|
287 |
2.4. Starting indexing
|
239 |
|
288 |
|
240 |
Indexation is performed either by the recollindex program, or by the
|
289 |
Indexing is performed either by the recollindex program, or by the
|
241 |
indexation thread inside the recoll program (use the File menu).
|
290 |
indexing thread inside the recoll program (use the File menu).
|
242 |
|
291 |
|
243 |
If the recoll program finds no database when it starts, it will
|
292 |
If the recoll program finds no index when it starts, it will automatically
|
244 |
automatically start indexation (except if cancelled).
|
293 |
start indexing (except if cancelled).
|
245 |
|
294 |
|
246 |
It is best to avoid interrupting the indexation process, as this may
|
295 |
It is best to avoid interrupting the indexing process, as this may
|
247 |
sometimes leave the database in a bad state. This is not a serious
|
296 |
sometimes leave the database in a bad state. This is not a serious
|
248 |
problem, as you then just need to clear everything and restart the
|
297 |
problem, as you then just need to clear everything and restart the
|
249 |
indexation: the database files are normally stored in the
|
298 |
indexing: the index files are normally stored in the
|
250 |
$HOME/.recoll/xapiandb directory, which you can just delete if needed.
|
299 |
$HOME/.recoll/xapiandb directory, which you can just delete if needed.
|
251 |
Alternatively, you can start recollindex -z, which will reset the database
|
300 |
Alternatively, you can start recollindex -z, which will reset the database
|
252 |
before indexation.
|
301 |
before indexing.
|
253 |
|
302 |
|
254 |
----------------------------------------------------------------------
|
303 |
----------------------------------------------------------------------
|
255 |
|
304 |
|
256 |
2.4. Using cron to automate indexation
|
305 |
2.5. Using cron to automate indexing
|
257 |
|
306 |
|
258 |
The most common way to set up indexation is to have a cron task execute it
|
307 |
The most common way to set up indexing is to have a cron task execute it
|
259 |
every night. For example the following crontab entry would do it every day
|
308 |
every night. For example the following crontab entry would do it every day
|
260 |
at 3:30AM (supposing recollindex is in your PATH):
|
309 |
at 3:30AM (supposing recollindex is in your PATH):
|
261 |
|
310 |
|
262 |
30 3 * * * recollindex > /tmp/recolltrace 2>&1
|
311 |
30 3 * * * recollindex > /tmp/recolltrace 2>&1
|
263 |
|
312 |
|
|
... |
|
... |
333 |
Click on the Show query details link at the top of the result page to see
|
382 |
Click on the Show query details link at the top of the result page to see
|
334 |
the query expansion.
|
383 |
the query expansion.
|
335 |
|
384 |
|
336 |
----------------------------------------------------------------------
|
385 |
----------------------------------------------------------------------
|
337 |
|
386 |
|
|
|
387 |
3.3. Multiple databases
|
|
|
388 |
|
|
|
389 |
Your Recoll configuration always defines a main index. This is what gets
|
|
|
390 |
updated, for example, when you execute recollindex.
|
|
|
391 |
|
|
|
392 |
You can use the search configuration tool to define additional databases
|
|
|
393 |
to be searched. These databases can be made active or inactive at any
|
|
|
394 |
moment.
|
|
|
395 |
|
|
|
396 |
The typical use of this feature is for a system administrator to set up a
|
|
|
397 |
central index, that you may choose to search, or not, in addition to your
|
|
|
398 |
personal data. Of course, there are other possibilities.
|
|
|
399 |
|
|
|
400 |
The main index (defined by your personal configuration) is always active.
|
|
|
401 |
|
|
|
402 |
The list of searchable databases may also be defined by the
|
|
|
403 |
RECOLL_EXTRA_DBS environment variable. This should hold a colon-separated
|
|
|
404 |
list of index directories, ie:
|
|
|
405 |
|
|
|
406 |
export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
|
|
|
407 |
|
|
|
408 |
----------------------------------------------------------------------
|
|
|
409 |
|
338 |
3.3. Document history
|
410 |
3.4. Document history
|
339 |
|
411 |
|
340 |
Documents that you actually view (with the internal preview or an external
|
412 |
Documents that you actually view (with the internal preview or an external
|
341 |
tool) are entered into the document history, which is remembered. You can
|
413 |
tool) are entered into the document history, which is remembered. You can
|
342 |
display the history list by using the Tools/Doc History menu entry.
|
414 |
display the history list by using the Tools/Doc History menu entry.
|
343 |
|
415 |
|
344 |
----------------------------------------------------------------------
|
416 |
----------------------------------------------------------------------
|
345 |
|
417 |
|
346 |
3.4. Result list sorting
|
418 |
3.5. Result list sorting
|
347 |
|
419 |
|
348 |
The documents in a result list are normally sorted in order of relevance.
|
420 |
The documents in a result list are normally sorted in order of relevance.
|
349 |
It is possible to specify different sort parameters by using the Sort
|
421 |
It is possible to specify different sort parameters by using the Sort
|
350 |
parameters dialog (located in the Tools menu).
|
422 |
parameters dialog (located in the Tools menu).
|
351 |
|
423 |
|
|
... |
|
... |
357 |
the program exits. An activated sort is indicated in the result list
|
429 |
the program exits. An activated sort is indicated in the result list
|
358 |
header.
|
430 |
header.
|
359 |
|
431 |
|
360 |
----------------------------------------------------------------------
|
432 |
----------------------------------------------------------------------
|
361 |
|
433 |
|
|
|
434 |
3.6. Additional result list functionality
|
|
|
435 |
|
|
|
436 |
Apart from the preview and edit links, you can display a popup menu by
|
|
|
437 |
right-clicking over a paragraph in the result list. This menu has the
|
|
|
438 |
following entries:
|
|
|
439 |
|
|
|
440 |
* Preview
|
|
|
441 |
|
|
|
442 |
* Edit
|
|
|
443 |
|
|
|
444 |
* Copy File Name
|
|
|
445 |
|
|
|
446 |
* Copy Url
|
|
|
447 |
|
|
|
448 |
* More like this
|
|
|
449 |
|
|
|
450 |
The Preview and Edit entries do the same thing as the corresponding links.
|
|
|
451 |
The two following entries will copy either an url or the file path to the
|
|
|
452 |
clipboard, for pasting into another application.
|
|
|
453 |
|
|
|
454 |
The More like this entry will select a number of relevant term from the
|
|
|
455 |
current document and enter them into the simple search field. You can then
|
|
|
456 |
start a simple search, with a good chance of finding documents related to
|
|
|
457 |
the current result.
|
|
|
458 |
|
|
|
459 |
----------------------------------------------------------------------
|
|
|
460 |
|
362 |
3.5. Search tips, shortcuts
|
461 |
3.7. Search tips, shortcuts
|
363 |
|
462 |
|
364 |
Disabling stem expansion. Entering a capitalized word in any search field
|
463 |
Disabling stem expansion. Entering a capitalized word in any search field
|
365 |
will prevent stem expansion (no search for gardening if you enter Garden
|
464 |
will prevent stem expansion (no search for gardening if you enter Garden
|
366 |
instead of garden). This is the only case where character case should make
|
465 |
instead of garden). This is the only case where character case should make
|
367 |
a difference for a Recoll search.
|
466 |
a difference for a Recoll search.
|
|
... |
|
... |
369 |
Phrases. A phrase can be looked for by enclosing it in double quotes.
|
468 |
Phrases. A phrase can be looked for by enclosing it in double quotes.
|
370 |
Example: "user manual" will look only for occurrences of user immediately
|
469 |
Example: "user manual" will look only for occurrences of user immediately
|
371 |
followed by manual. You can use the This exact phrase field of the
|
470 |
followed by manual. You can use the This exact phrase field of the
|
372 |
advanced search dialog to the same effect.
|
471 |
advanced search dialog to the same effect.
|
373 |
|
472 |
|
|
|
473 |
Term completion. Typing ^TAB (Control+Tab) in the simple search entry
|
|
|
474 |
field while entering a word will either complete the current word if its
|
|
|
475 |
beginning matches a unique term in the index, or open a window to propose
|
|
|
476 |
a list of completions
|
|
|
477 |
|
|
|
478 |
Picking up new terms for search from displayed documents. Double-clicking
|
|
|
479 |
on a word in the result list or in a preview window will copy it to the
|
|
|
480 |
simple search entry field.
|
|
|
481 |
|
|
|
482 |
Finding related documents. Selecting the More like this entry in the
|
|
|
483 |
result list paragraph right-click menu will select a set of "interesting"
|
|
|
484 |
terms from the current result, and insert them into the simple search
|
|
|
485 |
entry field. You can then possibly edit the list and start a search to
|
|
|
486 |
find documents which may be apparented to the current result.
|
|
|
487 |
|
374 |
Query explanation. You can get an exact description of what the query
|
488 |
Query explanation. You can get an exact description of what the query
|
375 |
looked for, including stem expansion, and boolean operators used, by
|
489 |
looked for, including stem expansion, and boolean operators used, by
|
376 |
clicking on the result list header.
|
490 |
clicking on the result list header.
|
377 |
|
491 |
|
378 |
File names. All file name elements (the broken up file path) are entered
|
492 |
File names. File names are added as terms during indexing, and you can
|
379 |
as terms during indexation, and you can specify them as ordinary terms in
|
493 |
specify them as ordinary terms in normal search fields (Recoll used to
|
380 |
normal search fields. Alternatively, you can use specific file name search
|
494 |
index all directories in the file path as terms. This has been abandonned
|
|
|
495 |
as it did not seem really useful). Alternatively, you can use specific
|
381 |
which will only look for file names and can use wildcard expansion.
|
496 |
file name search which will only look for file names and can use wildcard
|
|
|
497 |
expansion.
|
382 |
|
498 |
|
383 |
Quitting. Entering ^Q almost anywhere will close the application.
|
499 |
Quitting. Entering ^Q almost anywhere will close the application.
|
384 |
|
500 |
|
385 |
Closing previews. Entering ^W in a preview tab will close it (and, for the
|
501 |
Closing previews. Entering ^W in a preview tab will close it (and, for the
|
386 |
last tab, close the preview window).
|
502 |
last tab, close the preview window).
|
387 |
|
503 |
|
388 |
----------------------------------------------------------------------
|
504 |
----------------------------------------------------------------------
|
389 |
|
505 |
|
390 |
3.6. Customising the search interface
|
506 |
3.8. Customising the search interface
|
391 |
|
507 |
|
392 |
It is possible to customise some aspects of the search interface by using
|
508 |
It is possible to customise some aspects of the search interface by using
|
393 |
Query configuration entry in the Preferences menu.
|
509 |
Query configuration entry in the Preferences menu.
|
394 |
|
510 |
|
395 |
There are two tabs in the dialog, dealing with the interface itself, and
|
511 |
There are two tabs in the dialog, dealing with the interface itself, and
|
|
... |
|
... |
402 |
* Result list font: There is quite a lot of information shown in the
|
518 |
* Result list font: There is quite a lot of information shown in the
|
403 |
result list, and you may want to customise the font and/or font size.
|
519 |
result list, and you may want to customise the font and/or font size.
|
404 |
The rest of the fonts used by Recoll are determined by your generic QT
|
520 |
The rest of the fonts used by Recoll are determined by your generic QT
|
405 |
config (try the qtconfig command.
|
521 |
config (try the qtconfig command.
|
406 |
|
522 |
|
407 |
* Html help browser: this will let you chose your the preferred browser
|
523 |
* Html help browser: this will let you chose your preferred browser
|
408 |
which will be started from the Help menu to read the user manual. You
|
524 |
which will be started from the Help menu to read the user manual. You
|
409 |
can enter a simple name if the command is in your PATH, or browse for
|
525 |
can enter a simple name if the command is in your PATH, or browse for
|
410 |
a full pathname.
|
526 |
a full pathname.
|
411 |
|
527 |
|
412 |
* Show document type icons in result list: icons in the result list can
|
528 |
* Show document type icons in result list: icons in the result list can
|
413 |
be turned off. They take quite a lot of space and convey relatively
|
529 |
be turned off. They take quite a lot of space and convey relatively
|
414 |
little useful information.
|
530 |
little useful information.
|
|
|
531 |
|
|
|
532 |
* Auto-start simple search on whitespace entry: if this is checked, a
|
|
|
533 |
search will be executed each time you enter a space in the simple
|
|
|
534 |
search input field. This lets you look at the result list as you enter
|
|
|
535 |
new terms. This is off by default, you may like it or not...
|
415 |
|
536 |
|
416 |
Search parameters:
|
537 |
Search parameters:
|
417 |
|
538 |
|
418 |
* Stemming language: stemming obviously depends on the document's
|
539 |
* Stemming language: stemming obviously depends on the document's
|
419 |
language. This listbox will let you chose among the stemming databases
|
540 |
language. This listbox will let you chose among the stemming databases
|
420 |
which were built during indexing (this is set in the main
|
541 |
which were built during indexing (this is set in the main
|
421 |
configuration file), or later added with recollindex -s (See the
|
542 |
configuration file), or later added with recollindex -s (See the
|
422 |
recollindex manual). Stemming languages which are dynamically added
|
543 |
recollindex manual). Stemming languages which are dynamically added
|
423 |
will be deleted at the next indexation pass unless they are also added
|
544 |
will be deleted at the next indexing pass unless they are also added
|
424 |
in the configuration file.
|
545 |
in the configuration file.
|
425 |
|
546 |
|
426 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
547 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
427 |
document abstracts when displaying the result list. Abstracts are
|
548 |
document abstracts when displaying the result list. Abstracts are
|
428 |
constructed by taking context from the document information, around
|
549 |
constructed by taking context from the document information, around
|
|
... |
|
... |
431 |
|
552 |
|
432 |
* Replace abstracts from documents: this decides if we should synthetize
|
553 |
* Replace abstracts from documents: this decides if we should synthetize
|
433 |
and display an abstract in place of an explicit abstract found within
|
554 |
and display an abstract in place of an explicit abstract found within
|
434 |
the document itself.
|
555 |
the document itself.
|
435 |
|
556 |
|
|
|
557 |
Extra databases:
|
|
|
558 |
|
|
|
559 |
This panel will let you browse for additional databases that you may want
|
|
|
560 |
to search. Extra databases are designated by their database directory (ie:
|
|
|
561 |
/home/someothergui/.recoll/xapiandb, /usr/local/recollglobal/xapiandb).
|
|
|
562 |
|
|
|
563 |
Once entered, the databases will appear in the All extra databases list,
|
|
|
564 |
and you can chose which ones you want to use at any moment by tranferring
|
|
|
565 |
them to/from the Active extra databases list.
|
|
|
566 |
|
|
|
567 |
Your main database (the one the current configuration indexes to), is
|
|
|
568 |
always implicitely active. If this is not desirable, you can set up your
|
|
|
569 |
configuration so that it indexes, for example, an empty directory.
|
|
|
570 |
|
436 |
----------------------------------------------------------------------
|
571 |
----------------------------------------------------------------------
|
437 |
|
572 |
|
438 |
Chapter 4. Installation
|
573 |
Chapter 4. Installation
|
439 |
|
574 |
|
440 |
4.1. Building from source
|
575 |
4.1. Building from source
|
441 |
|
576 |
|
442 |
4.1.1. Prerequisites
|
577 |
4.1.1. Prerequisites
|
443 |
|
578 |
|
444 |
At the very least, you will need to download and install the xapian core
|
579 |
At the very least, you will need to download and install the xapian core
|
445 |
package (Recoll currently uses version 0.9.2), and the qt runtime and
|
580 |
package (Recoll development currently uses version 0.9.5), and the qt
|
446 |
development packages (Recoll development currently uses version 3.3.5, but
|
581 |
runtime and development packages (Recoll development currently uses
|
447 |
any 3.3 version is probably ok).
|
582 |
version 3.3.5, but any 3.3 version is probably ok).
|
448 |
|
583 |
|
449 |
You will most probably be able to find a binary package for qt for your
|
584 |
You will most probably be able to find a binary package for qt for your
|
450 |
system. You may have to compile Xapian but this is not difficult (if you
|
585 |
system. You may have to compile Xapian but this is not difficult (if you
|
451 |
are using FreeBSD, there is a port).
|
586 |
are using FreeBSD, there is a port).
|
452 |
|
587 |
|
|
... |
|
... |
561 |
|
696 |
|
562 |
There are two sets of configuration files. The system-wide files are kept
|
697 |
There are two sets of configuration files. The system-wide files are kept
|
563 |
in a directory named like /usr/[local/]share/recoll/examples, they define
|
698 |
in a directory named like /usr/[local/]share/recoll/examples, they define
|
564 |
default values for the system. A parallel set of files exists in the
|
699 |
default values for the system. A parallel set of files exists in the
|
565 |
.recoll directory in your home (this can be changed with the
|
700 |
.recoll directory in your home (this can be changed with the
|
566 |
RECOLL_CONFDIR environment variable. The database is also kept in .recoll
|
701 |
RECOLL_CONFDIR environment variable.
|
567 |
by default, (this can be changed by a configuration parameter).
|
|
|
568 |
|
702 |
|
569 |
If the .recoll directory does not exist when recoll or recollindex are
|
703 |
If the .recoll directory does not exist when recoll or recollindex are
|
570 |
started, it will be created with a set of empty configuration files.
|
704 |
started, it will be created with a set of empty configuration files.
|
571 |
recoll will give you a chance to edit the configuration file before
|
705 |
recoll will give you a chance to edit the configuration file before
|
572 |
starting indexation. recollindex will proceed immediately.
|
706 |
starting indexing. recollindex will proceed immediately.
|
573 |
|
707 |
|
574 |
Most of the parameters specific to the recoll GUI are set through the
|
708 |
Most of the parameters specific to the recoll GUI are set through the
|
575 |
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
709 |
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
576 |
You probably do not want to edit this by hand.
|
710 |
You probably do not want to edit this by hand.
|
577 |
|
711 |
|
|
... |
|
... |
598 |
* Parameter affectation (name = value).
|
732 |
* Parameter affectation (name = value).
|
599 |
|
733 |
|
600 |
* Section definition ([somedirname]).
|
734 |
* Section definition ([somedirname]).
|
601 |
|
735 |
|
602 |
Section lines allow redefining some parameters for a directory subtree.
|
736 |
Section lines allow redefining some parameters for a directory subtree.
|
603 |
Some of the parameters used for indexation are looked up hierarchically
|
737 |
Some of the parameters used for indexing are looked up hierarchically from
|
604 |
from the more to the less specific. Not all parameters can be meaningfully
|
738 |
the more to the less specific. Not all parameters can be meaningfully
|
605 |
redefined, this is specified for each in the next section.
|
739 |
redefined, this is specified for each in the next section.
|
606 |
|
740 |
|
607 |
The tilde character (~) is expanded in file names to the name of the
|
741 |
The tilde character (~) is expanded in file names to the name of the
|
608 |
user's home directory.
|
742 |
user's home directory.
|
609 |
|
743 |
|
|
... |
|
... |
617 |
recoll.conf is the main configuration file. It defines things like what to
|
751 |
recoll.conf is the main configuration file. It defines things like what to
|
618 |
index (top directories and things to ignore), and the default character
|
752 |
index (top directories and things to ignore), and the default character
|
619 |
set to use for document types which do not specify it internally.
|
753 |
set to use for document types which do not specify it internally.
|
620 |
|
754 |
|
621 |
The default configuration will index your home directory. If this is not
|
755 |
The default configuration will index your home directory. If this is not
|
622 |
appropriate, use recoll to copy the sample configuration, click Cancel,
|
756 |
appropriate, start recoll to create a blank configuration, click Cancel,
|
623 |
and edit the configuration file before restarting the command. This will
|
757 |
and edit the configuration file before restarting the command. This will
|
624 |
start the initial indexation, which may take some time.
|
758 |
start the initial indexing, which may take some time.
|
625 |
|
759 |
|
626 |
Paramers:
|
760 |
Paramers:
|
627 |
|
761 |
|
628 |
topdirs
|
762 |
topdirs
|
629 |
|
763 |
|
630 |
Specifies the list of directories or files to index (recursively
|
764 |
Specifies the list of directories or files to index (recursively
|
631 |
for directories). The indexer will not follow symbolic links
|
765 |
for directories). The indexer will not follow symbolic links
|
632 |
inside the indexed trees. If an entry in the topdirs list is a
|
766 |
inside the indexed trees. If an entry in the topdirs list is a
|
633 |
symbolic link, indexation will not start and will generate an
|
767 |
symbolic link, indexing will not start and will generate an error.
|
634 |
error.
|
|
|
635 |
|
768 |
|
636 |
skippedNames
|
769 |
skippedNames
|
637 |
|
770 |
|
638 |
A space-separated list of patterns for names of files or
|
771 |
A space-separated list of patterns for names of files or
|
639 |
directories that should be completely ignored. The list defined in
|
772 |
directories that should be completely ignored. The list defined in
|
|
... |
|
... |
660 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
793 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
661 |
quite a lot of debug/information messages. 2 only lists errors.
|
794 |
quite a lot of debug/information messages. 2 only lists errors.
|
662 |
|
795 |
|
663 |
logfilename
|
796 |
logfilename
|
664 |
|
797 |
|
665 |
Where should the messages go. 'stderr' can be used as a special
|
798 |
Where the messages should go. 'stderr' can be used as a special
|
666 |
value.
|
799 |
value, and is the default.
|
667 |
|
800 |
|
668 |
filtersdir
|
801 |
filtersdir
|
669 |
|
802 |
|
670 |
A directory to search for the external filter scripts used to
|
803 |
A directory to search for the external filter scripts used to
|
671 |
index some types of files. The value should not be changed, except
|
804 |
index some types of files. The value should not be changed, except
|
|
... |
|
... |
675 |
indexstemminglanguages
|
808 |
indexstemminglanguages
|
676 |
|
809 |
|
677 |
A list of languages for which the stem expansion databases will be
|
810 |
A list of languages for which the stem expansion databases will be
|
678 |
built. See recollindex(1) for possible values. You can add a stem
|
811 |
built. See recollindex(1) for possible values. You can add a stem
|
679 |
expansion database for a different language by using recollindex
|
812 |
expansion database for a different language by using recollindex
|
680 |
-s, but it will be deleted during the next indexation. Only
|
813 |
-s, but it will be deleted during the next indexing. Only
|
681 |
languages listed in the configuration file are permanent.
|
814 |
languages listed in the configuration file are permanent.
|
682 |
|
815 |
|
683 |
iconsdir
|
816 |
iconsdir
|
684 |
|
817 |
|
685 |
The name of the directory where recoll result list icons are
|
818 |
The name of the directory where recoll result list icons are
|
686 |
stored. You can change this if you want different images.
|
819 |
stored. You can change this if you want different images.
|
687 |
|
820 |
|
688 |
dbdir
|
821 |
dbdir
|
689 |
|
822 |
|
690 |
The name of the Xapian database directory. It will be created if
|
823 |
The name of the Xapian data directory. It will be created if
|
691 |
needed when the database is initialized.
|
824 |
needed when the index is initialized.
|
692 |
|
825 |
|
693 |
defaultcharset
|
826 |
defaultcharset
|
694 |
|
827 |
|
695 |
The name of the character set used for files that do not contain a
|
828 |
The name of the character set used for files that do not contain a
|
696 |
character set definition (ie: plain text files). This can be
|
829 |
character set definition (ie: plain text files). This can be
|
|
... |
|
... |
708 |
|
841 |
|
709 |
Decide if we use the file -i system command as a final step for
|
842 |
Decide if we use the file -i system command as a final step for
|
710 |
determining the mime type for a file (the main procedure uses
|
843 |
determining the mime type for a file (the main procedure uses
|
711 |
suffix associations as defined in the mimemap file). This can be
|
844 |
suffix associations as defined in the mimemap file). This can be
|
712 |
useful for files with suffixless names, but it will also cause the
|
845 |
useful for files with suffixless names, but it will also cause the
|
713 |
indexation of many bogus "text" files.
|
846 |
indexing of many bogus "text" files.
|
714 |
|
847 |
|
715 |
indexallfilenames
|
848 |
indexallfilenames
|
716 |
|
849 |
|
717 |
Recoll indexes file names in a special section of the database to
|
850 |
Recoll indexes file names in a special section of the database to
|
718 |
allow specific file names searches using wild cards. This
|
851 |
allow specific file names searches using wild cards. This
|
719 |
parameter decides if file name indexing is performed only for
|
852 |
parameter decides if file name indexing is performed only for
|
720 |
files with mime types that would qualify them for full text
|
853 |
files with mime types that would qualify them for full text
|
721 |
indexation, or for all files inside the selected subtrees,
|
854 |
indexing, or for all files inside the selected subtrees,
|
722 |
independant of mime type.
|
855 |
independant of mime type.
|
723 |
|
856 |
|
724 |
----------------------------------------------------------------------
|
857 |
----------------------------------------------------------------------
|
725 |
|
858 |
|
726 |
4.4.2. The mimemap file
|
859 |
4.4.2. The mimemap file
|
|
... |
|
... |
728 |
mimemap specifies the file name extension to mime type mappings.
|
861 |
mimemap specifies the file name extension to mime type mappings.
|
729 |
|
862 |
|
730 |
For file names without an extension, or with an unknown one, the system's
|
863 |
For file names without an extension, or with an unknown one, the system's
|
731 |
file -i command will be executed to determine the mime type (this can be
|
864 |
file -i command will be executed to determine the mime type (this can be
|
732 |
switched off inside the main configuration file).
|
865 |
switched off inside the main configuration file).
|
733 |
|
|
|
734 |
mimemap also has a list of extensions which should be ignored totally (to
|
|
|
735 |
avoid losing time by executing file for things that certainly should not
|
|
|
736 |
be indexed).
|
|
|
737 |
|
866 |
|
738 |
The mappings can be specified on a per-subtree basis, which may be useful
|
867 |
The mappings can be specified on a per-subtree basis, which may be useful
|
739 |
in some cases. Example: gaim logs have a .txt extension but should be
|
868 |
in some cases. Example: gaim logs have a .txt extension but should be
|
740 |
handled specially, which is possible because they are usually all located
|
869 |
handled specially, which is possible because they are usually all located
|
741 |
in one place.
|
870 |
in one place.
|
|
... |
|
... |
748 |
|
877 |
|
749 |
----------------------------------------------------------------------
|
878 |
----------------------------------------------------------------------
|
750 |
|
879 |
|
751 |
4.4.3. The mimeconf file
|
880 |
4.4.3. The mimeconf file
|
752 |
|
881 |
|
753 |
mimeconf specifies how the different mime types are handled for
|
882 |
mimeconf specifies how the different mime types are handled for indexing,
|
754 |
indexation, and for display.
|
883 |
and for display.
|
755 |
|
884 |
|
756 |
Changing the indexation parameters is probably not a good idea except if
|
885 |
Changing the indexing parameters is probably not a good idea except if you
|
757 |
you are a Recoll developper.
|
886 |
are a Recoll developper.
|
758 |
|
887 |
|
759 |
You may want to adjust the external viewers defined in (ie: html is either
|
888 |
You may want to adjust the external viewers defined in (ie: html is either
|
760 |
previewed internally or displayed using firefox, but you may prefer
|
889 |
previewed internally or displayed using firefox, but you may prefer
|
761 |
mozilla, your openoffice.org program might be named oofice instead of
|
890 |
mozilla, your openoffice.org program might be named oofice instead of
|
762 |
openoffice ...). Look for the [view] section.
|
891 |
openoffice ...). Look for the [view] section.
|