|
a/src/README |
|
b/src/README |
|
|
1 |
|
|
|
2 |
More documentation can be found in the doc/ directory or at http://www.recoll.org
|
|
|
3 |
|
|
|
4 |
|
1 |
Recoll user manual
|
5 |
Recoll user manual
|
2 |
|
6 |
|
3 |
Jean-Francois Dockes
|
7 |
Jean-Francois Dockes
|
4 |
|
8 |
|
5 |
<jean-francois.dockes@wanadoo.fr>
|
9 |
<jean-francois.dockes@wanadoo.fr>
|
6 |
|
10 |
|
7 |
Copyright (c) 2005 Jean-Francois Dockes
|
11 |
Copyright (c) 2005 Jean-Francois Dockes
|
8 |
|
12 |
|
9 |
This document introduces full text search notions and describes
|
13 |
This document introduces full text search notions and describes the
|
10 |
the installation and use of the Recoll application.
|
14 |
installation and use of the Recoll application.
|
11 |
|
15 |
|
|
|
16 |
[ Split HTML / Single HTML ]
|
|
|
17 |
|
12 |
--------------------------------------------------------------
|
18 |
----------------------------------------------------------------------
|
13 |
|
19 |
|
14 |
Table of Contents
|
20 |
Table of Contents
|
15 |
|
21 |
|
16 |
1. Introduction
|
22 |
1. Introduction
|
17 |
|
23 |
|
|
... |
|
... |
55 |
|
61 |
|
56 |
4.1.3. Installation
|
62 |
4.1.3. Installation
|
57 |
|
63 |
|
58 |
4.2. Installing a prebuilt copy
|
64 |
4.2. Installing a prebuilt copy
|
59 |
|
65 |
|
60 |
4.2.1. Installing through a package
|
66 |
4.2.1. Installing through a package system
|
61 |
system
|
|
|
62 |
|
67 |
|
63 |
4.2.2. Installing a prebuilt Recoll
|
68 |
4.2.2. Installing a prebuilt Recoll
|
64 |
|
69 |
|
65 |
4.3. Configuration overview
|
70 |
4.3. Configuration overview
|
66 |
|
71 |
|
|
... |
|
... |
68 |
|
73 |
|
69 |
4.3.2. The mimemap file
|
74 |
4.3.2. The mimemap file
|
70 |
|
75 |
|
71 |
4.3.3. The mimeconf file
|
76 |
4.3.3. The mimeconf file
|
72 |
|
77 |
|
73 |
--------------------------------------------------------------
|
78 |
----------------------------------------------------------------------
|
74 |
|
79 |
|
75 |
Chapter 1. Introduction
|
80 |
Chapter 1. Introduction
|
76 |
|
81 |
|
77 |
1.1. Giving it a try
|
82 |
1.1. Giving it a try
|
78 |
|
83 |
|
79 |
If you do not like reading manuals (who does?) and would like to
|
84 |
If you do not like reading manuals (who does?) and would like to give
|
80 |
give Recoll a try, just perform installation and start the recoll
|
85 |
Recoll a try, just perform installation and start the recoll user
|
81 |
user interface, which will index your home directory and let you
|
86 |
interface, which will index your home directory and let you search it
|
82 |
search it right after.
|
87 |
right after.
|
83 |
|
88 |
|
84 |
Do not do this if your home has a huge number of documents and you
|
89 |
Do not do this if your home has a huge number of documents and you do not
|
85 |
do not want to wait or are very short on disk space. In this case,
|
90 |
want to wait or are very short on disk space. In this case, you may want
|
86 |
you may want to edit the configuration file first to restrict the
|
91 |
to edit the configuration file first to restrict the indexed area.
|
87 |
indexed area.
|
|
|
88 |
|
92 |
|
89 |
Also be aware that you will need to install the appropriate
|
93 |
Also be aware that you will need to install the appropriate supporting
|
90 |
supporting applications for document types that need them (for
|
94 |
applications for document types that need them (for example antiword for
|
91 |
example antiword for ms-word files).
|
95 |
ms-word files).
|
92 |
|
96 |
|
93 |
--------------------------------------------------------------
|
97 |
----------------------------------------------------------------------
|
94 |
|
98 |
|
95 |
1.2. Full text search
|
99 |
1.2. Full text search
|
96 |
|
100 |
|
97 |
Recoll is a full text search application. Full text search
|
101 |
Recoll is a full text search application. Full text search applications
|
98 |
applications let you find your data by content rather than by
|
102 |
let you find your data by content rather than by external attributes (like
|
99 |
external attributes (like a file name). More specifically, they
|
103 |
a file name). More specifically, they will let you specify words (terms)
|
100 |
will let you specify words (terms) that should or should not
|
104 |
that should or should not appear in the text you are looking for, and
|
101 |
appear in the text you are looking for, and return a list of
|
|
|
102 |
matching documents, ordered so that the most relevant documents
|
105 |
return a list of matching documents, ordered so that the most relevant
|
103 |
will appear first.
|
106 |
documents will appear first.
|
104 |
|
107 |
|
105 |
You do not need to remember in what file or email message you
|
108 |
You do not need to remember in what file or email message you stored a
|
106 |
stored a given piece of information. You just ask for related
|
109 |
given piece of information. You just ask for related terms, and the tool
|
107 |
terms, and the tool will return a list of documents where those
|
110 |
will return a list of documents where those terms are prominent.
|
108 |
terms are prominent.
|
|
|
109 |
|
111 |
|
110 |
This mode of operation has been made very familiar by internet
|
112 |
This mode of operation has been made very familiar by internet search
|
111 |
search engines.
|
113 |
engines.
|
112 |
|
114 |
|
113 |
The notion of relevance is a difficult one, as only you, the user,
|
115 |
The notion of relevance is a difficult one, as only you, the user,
|
114 |
actually know which documents are relevant to your search, and the
|
116 |
actually know which documents are relevant to your search, and the
|
115 |
application can only try a guess. The quality of this guess is
|
117 |
application can only try a guess. The quality of this guess is probably
|
116 |
probably the most important element for a search application.
|
118 |
the most important element for a search application.
|
117 |
|
119 |
|
118 |
In many cases, you are looking for all the forms of a word, not
|
120 |
In many cases, you are looking for all the forms of a word, not for a
|
119 |
for a specific form or spelling. These different forms may include
|
121 |
specific form or spelling. These different forms may include plurals,
|
120 |
plurals, different tenses for a verb, or terms derived from the
|
122 |
different tenses for a verb, or terms derived from the same root or stem
|
121 |
same root or stem (exemple: floor, floors, floored, floorings...).
|
123 |
(exemple: floor, floors, floored, floorings...). Recoll will by default
|
122 |
Recoll will by default expand queries to all such related terms
|
124 |
expand queries to all such related terms (words that reduce to the same
|
123 |
(words that reduce to the same stem). This expansion can be
|
125 |
stem). This expansion can be disabled at search time.
|
124 |
disabled at search time.
|
|
|
125 |
|
126 |
|
126 |
Stemming, by itself, does not provide for misspellings or phonetic
|
127 |
Stemming, by itself, does not provide for misspellings or phonetic
|
127 |
searches. Recoll currently does not support these.
|
128 |
searches. Recoll currently does not support these.
|
128 |
|
129 |
|
129 |
--------------------------------------------------------------
|
130 |
----------------------------------------------------------------------
|
130 |
|
131 |
|
131 |
1.3. Recoll overview
|
132 |
1.3. Recoll overview
|
132 |
|
133 |
|
133 |
Recoll uses the Xapian information retrieval library as its
|
134 |
Recoll uses the Xapian information retrieval library as its storage and
|
134 |
storage and retrieval engine. Xapian is a very mature package
|
135 |
retrieval engine. Xapian is a very mature package using a sophisticated
|
135 |
using a sophisticated probabilistic ranking model. Recoll provides
|
136 |
probabilistic ranking model. Recoll provides the interface to get data
|
136 |
the interface to get data into (indexation) and out (searching) of
|
137 |
into (indexation) and out (searching) of the system.
|
137 |
the system.
|
|
|
138 |
|
138 |
|
139 |
In practice, Xapian works by remembering where terms appear in
|
139 |
In practice, Xapian works by remembering where terms appear in your
|
140 |
your document files. The acquisition process is called indexation.
|
140 |
document files. The acquisition process is called indexation.
|
141 |
|
141 |
|
142 |
The resulting database can be big (roughly the size of the
|
142 |
The resulting database can be big (roughly the size of the original
|
143 |
original document set), but it is not a document archive. Recoll
|
143 |
document set), but it is not a document archive. Recoll can only display
|
144 |
can only display documents that still exist at the place from
|
144 |
documents that still exist at the place from which they were indexed.
|
145 |
which they were indexed. (Actually, there is a way to reconstruct
|
145 |
(Actually, there is a way to reconstruct a document from the information
|
146 |
a document from the information in the database, but the result is
|
146 |
in the database, but the result is not nice, as all formatting,
|
147 |
not nice, as all formatting, punctuation and capitalisation are
|
147 |
punctuation and capitalisation are lost).
|
148 |
lost).
|
|
|
149 |
|
148 |
|
150 |
Recoll stores all internal data in Unicode UTF-8 format, and it
|
149 |
Recoll stores all internal data in Unicode UTF-8 format, and it can index
|
151 |
can index files with different character sets, encodings, and
|
150 |
files with different character sets, encodings, and languages into the
|
152 |
languages into the same database. It has input filters for many
|
151 |
same database. It has input filters for many document types.
|
153 |
document types.
|
|
|
154 |
|
152 |
|
155 |
Stemming depends on the document language. Recoll stores the
|
153 |
Stemming depends on the document language. Recoll stores the unstemmed
|
156 |
unstemmed versions of terms and uses auxiliary databases for term
|
154 |
versions of terms and uses auxiliary databases for term expansion. It can
|
157 |
expansion. It can switch stemming languages, or add a language,
|
155 |
switch stemming languages, or add a language, without reindexing. Storing
|
158 |
without reindexing. Storing documents in different languages in
|
156 |
documents in different languages in the same database is possible, and
|
159 |
the same database is possible, and useful in practice, but does
|
157 |
useful in practice, but does introduce possibilities of confusion. Recoll
|
160 |
introduce possibilities of confusion. Recoll currently makes no
|
|
|
161 |
attempt at automatic language recognition.
|
158 |
currently makes no attempt at automatic language recognition.
|
162 |
|
159 |
|
163 |
Recoll has many parameters which define exactly what to index, and
|
160 |
Recoll has many parameters which define exactly what to index, and how to
|
164 |
how to classify and decode the source documents. These are kept in
|
161 |
classify and decode the source documents. These are kept in a
|
165 |
a configuration file. A default configuration is copied into a
|
162 |
configuration file. A default configuration is copied into a standard
|
166 |
standard location (usually something like
|
163 |
location (usually something like /usr/[local/]share/recoll/examples)
|
167 |
/usr/[local/]share/recoll/examples) during installation. The
|
164 |
during installation. The default parameters from this file may be
|
168 |
default parameters from this file may be overriden by values that
|
165 |
overriden by values that you set inside your personal configuration, found
|
169 |
you set inside your personal configuration, found by default in
|
|
|
170 |
the .recoll subdirectory of your home directory. The default
|
166 |
by default in the .recoll subdirectory of your home directory. The default
|
171 |
configuration will index your home directory with default
|
167 |
configuration will index your home directory with default parameters and
|
172 |
parameters and should be sufficient for giving Recoll a try, but
|
168 |
should be sufficient for giving Recoll a try, but you may want to adjust
|
173 |
you may want to adjust it later.
|
169 |
it later.
|
174 |
|
170 |
|
175 |
Indexation is started automatically the first time you execute the
|
171 |
Indexation is started automatically the first time you execute the recoll
|
176 |
recoll search graphical user interface, or by executing the
|
172 |
search graphical user interface, or by executing the recollindex command.
|
177 |
recollindex command.
|
|
|
178 |
|
173 |
|
179 |
Searches are performed inside the recoll program, which has many
|
174 |
Searches are performed inside the recoll program, which has many options
|
180 |
options to help you find what you are looking for.
|
175 |
to help you find what you are looking for.
|
181 |
|
176 |
|
182 |
--------------------------------------------------------------
|
177 |
----------------------------------------------------------------------
|
183 |
|
178 |
|
184 |
Chapter 2. Indexation
|
179 |
Chapter 2. Indexation
|
185 |
|
180 |
|
186 |
2.1. Introduction
|
181 |
2.1. Introduction
|
187 |
|
182 |
|
188 |
Indexation is the process by which the set of documents is
|
183 |
Indexation is the process by which the set of documents is analyzed and
|
189 |
analyzed and the data entered into the database. Recoll indexation
|
184 |
the data entered into the database. Recoll indexation is normally
|
190 |
is normally incremental: documents will only be processed if they
|
185 |
incremental: documents will only be processed if they have been modified.
|
191 |
have been modified. On the first execution, of course, all
|
186 |
On the first execution, of course, all documents will need processing. A
|
192 |
documents will need processing. A full index build can be forced
|
187 |
full index build can be forced later on by specifying an option to the
|
193 |
later on by specifying an option to the indexation command
|
188 |
indexation command (recollindex -z).
|
194 |
(recollindex -z).
|
|
|
195 |
|
189 |
|
196 |
Recoll indexation takes place at discrete times. There is
|
190 |
Recoll indexation takes place at discrete times. There is currently no
|
197 |
currently no interface to real time file modification monitors.
|
191 |
interface to real time file modification monitors. The typical usage is to
|
198 |
The typical usage is to have a nightly indexation run programmed
|
192 |
have a nightly indexation run programmed into your cron file.
|
199 |
into your cron file.
|
|
|
200 |
|
193 |
|
201 |
+----------------------------------------------------------------+
|
194 |
+------------------------------------------------------------------------+
|
202 |
| Side note: there is nothing in Recoll and Xapian that would |
|
195 |
| Side note: there is nothing in Recoll and Xapian that would prevent |
|
203 |
| prevent interfacing with a real time file modification |
|
196 |
| interfacing with a real time file modification monitor, but this would |
|
204 |
| monitor, but this would tend to consume significant system |
|
197 |
| tend to consume significant system resources for dubious gain, because |
|
205 |
| resources for dubious gain, because you rarely need a full |
|
198 |
| you rarely need a full text search to find documents you just |
|
206 |
| text search to find documents you just modified. recollindex |
|
|
|
207 |
| -i can be used to add individual files to the index if you |
|
199 |
| modified. recollindex -i can be used to add individual files to the |
|
208 |
| want to play with this, see the manual page. |
|
200 |
| index if you want to play with this, see the manual page. |
|
209 |
+----------------------------------------------------------------+
|
201 |
+------------------------------------------------------------------------+
|
210 |
|
202 |
|
211 |
Recoll knows about quite a few different document types. The
|
203 |
Recoll knows about quite a few different document types. The parameters
|
212 |
parameters for document types recognition and processing are set
|
204 |
for document types recognition and processing are set in configuration
|
213 |
in configuration files Most file types, like HTML or word
|
205 |
files Most file types, like HTML or word processing files, only hold one
|
214 |
processing files, only hold one document. Some file types, like
|
206 |
document. Some file types, like mail folder files can hold many
|
215 |
mail folder files can hold many individually indexed documents.
|
207 |
individually indexed documents.
|
216 |
|
208 |
|
217 |
Recoll indexation processes plain text, HTML, openoffice and
|
209 |
Recoll indexation processes plain text, HTML, openoffice and e-mail files
|
218 |
e-mail files internally. Other types (ie: postscript, pdf,
|
210 |
internally. Other types (ie: postscript, pdf, ms-word, rtf) need external
|
219 |
ms-word, rtf) need external applications for preprocessing. The
|
211 |
applications for preprocessing. The list is in the installation section.
|
220 |
list is in the installation section.
|
|
|
221 |
|
212 |
|
222 |
Without further configuration, Recoll will index all appropriate
|
213 |
Without further configuration, Recoll will index all appropriate files
|
223 |
files from your home directory, with a reasonable set of defaults.
|
214 |
from your home directory, with a reasonable set of defaults.
|
224 |
|
215 |
|
225 |
--------------------------------------------------------------
|
216 |
----------------------------------------------------------------------
|
226 |
|
217 |
|
227 |
2.2. The indexation configuration
|
218 |
2.2. The indexation configuration
|
228 |
|
219 |
|
229 |
Values set in the system-wide configuration file (named like
|
220 |
Values set in the system-wide configuration file (named like
|
230 |
/usr/[local/]share/recoll/examples/recoll.conf) can be overriden
|
221 |
/usr/[local/]share/recoll/examples/recoll.conf) can be overriden by those
|
231 |
by those set in the personal one, named $HOME/.recoll/recoll.conf
|
222 |
set in the personal one, named $HOME/.recoll/recoll.conf by default or
|
232 |
by default or $RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is
|
223 |
$RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set.
|
233 |
set.
|
|
|
234 |
|
224 |
|
235 |
The most accurate documentation for editing the file is given by
|
225 |
The most accurate documentation for editing the file is given by comments
|
236 |
comments inside the central one. If you want to adjust the
|
226 |
inside the central one. If you want to adjust the configuration before
|
237 |
configuration before indexation, just click Cancel when the
|
227 |
indexation, just click Cancel when the program asks if it should start
|
238 |
program asks if it should start initial indexation. This will have
|
228 |
initial indexation. This will have created a .recoll directory containing
|
239 |
created a .recoll directory containing empty configuration files.
|
229 |
empty configuration files.
|
240 |
|
230 |
|
241 |
The configuration is also documented inside the installation
|
231 |
The configuration is also documented inside the installation chapter of
|
242 |
chapter of this document, or in the recoll.conf(5) man page.
|
232 |
this document, or in the recoll.conf(5) man page.
|
243 |
|
233 |
|
244 |
--------------------------------------------------------------
|
234 |
----------------------------------------------------------------------
|
245 |
|
235 |
|
246 |
2.3. Starting indexation
|
236 |
2.3. Starting indexation
|
247 |
|
237 |
|
248 |
Indexation is performed either by the recollindex program, or by
|
238 |
Indexation is performed either by the recollindex program, or by the
|
249 |
the indexation thread inside the recoll program (use the File
|
239 |
indexation thread inside the recoll program (use the File menu).
|
250 |
menu).
|
|
|
251 |
|
240 |
|
252 |
If the recoll program finds no database when it starts, it will
|
241 |
If the recoll program finds no database when it starts, it will
|
253 |
automatically start indexation (except if cancelled).
|
242 |
automatically start indexation (except if cancelled).
|
254 |
|
243 |
|
255 |
It is best to avoid interrupting the indexation process, as this
|
244 |
It is best to avoid interrupting the indexation process, as this may
|
256 |
may sometimes leave the database in a bad state. This is not a
|
245 |
sometimes leave the database in a bad state. This is not a serious
|
257 |
serious problem, as you then just need to clear everything and
|
246 |
problem, as you then just need to clear everything and restart the
|
258 |
restart the indexation: the database files are normally stored in
|
247 |
indexation: the database files are normally stored in the
|
259 |
the $HOME/.recoll/xapiandb directory, which you can just delete if
|
248 |
$HOME/.recoll/xapiandb directory, which you can just delete if needed.
|
260 |
needed. Alternatively, you can start recollindex -z, which will
|
249 |
Alternatively, you can start recollindex -z, which will reset the database
|
261 |
reset the database before indexation.
|
250 |
before indexation.
|
262 |
|
251 |
|
263 |
--------------------------------------------------------------
|
252 |
----------------------------------------------------------------------
|
264 |
|
253 |
|
265 |
2.4. Using cron to automate indexation
|
254 |
2.4. Using cron to automate indexation
|
266 |
|
255 |
|
267 |
The most common way to set up indexation is to have a cron task
|
256 |
The most common way to set up indexation is to have a cron task execute it
|
268 |
execute it every night. For example the following crontab entry
|
257 |
every night. For example the following crontab entry would do it every day
|
269 |
would do it every day at 3:30AM (supposing recollindex is in your
|
258 |
at 3:30AM (supposing recollindex is in your PATH):
|
270 |
PATH):
|
|
|
271 |
|
259 |
|
272 |
30 3 * * * recollindex > /tmp/recolltrace 2>&1
|
260 |
30 3 * * * recollindex > /tmp/recolltrace 2>&1
|
273 |
|
261 |
|
274 |
The usual command to edit your crontab is crontab -e (which will
|
262 |
The usual command to edit your crontab is crontab -e (which will usually
|
275 |
usually start the vi editor to edit the file). You may have more
|
263 |
start the vi editor to edit the file). You may have more sophisticated
|
276 |
sophisticated tools available on your system.
|
264 |
tools available on your system.
|
277 |
|
265 |
|
278 |
--------------------------------------------------------------
|
266 |
----------------------------------------------------------------------
|
279 |
|
267 |
|
280 |
Chapter 3. Search
|
268 |
Chapter 3. Search
|
281 |
|
269 |
|
282 |
The recoll program provides the user interface for searching. It
|
270 |
The recoll program provides the user interface for searching. It is based
|
283 |
is based on the QT library.
|
271 |
on the QT library.
|
284 |
|
272 |
|
285 |
--------------------------------------------------------------
|
273 |
----------------------------------------------------------------------
|
286 |
|
274 |
|
287 |
3.1. Simple search
|
275 |
3.1. Simple search
|
288 |
|
276 |
|
289 |
1. Start the recoll program.
|
277 |
1. Start the recoll program.
|
290 |
|
278 |
|
291 |
2. Possibly choose a search mode: Any term or All terms or File
|
279 |
2. Possibly choose a search mode: Any term or All terms or File name.
|
292 |
name.
|
|
|
293 |
|
280 |
|
294 |
3. Enter search term(s) in the text field at the top of the
|
281 |
3. Enter search term(s) in the text field at the top of the window.
|
295 |
window.
|
|
|
296 |
|
282 |
|
297 |
4. Click the Search button or hit the Enter key to start the
|
283 |
4. Click the Search button or hit the Enter key to start the search.
|
298 |
search.
|
|
|
299 |
|
284 |
|
300 |
The initial default search mode is Any term. This will look for
|
285 |
The initial default search mode is Any term. This will look for documents
|
301 |
documents with any of the search terms (the ones with more terms
|
286 |
with any of the search terms (the ones with more terms will get better
|
302 |
will get better scores). All terms will ensure that only documents
|
287 |
scores). All terms will ensure that only documents with all the terms will
|
303 |
with all the terms will be returned. File name will specifically
|
288 |
be returned. File name will specifically look for file names, and allows
|
304 |
look for file names, and allows using wildcards (*, ? , []).
|
289 |
using wildcards (*, ? , []).
|
305 |
|
290 |
|
306 |
You can use the Tools / Advanced search dialog for more complex
|
291 |
You can use the Tools / Advanced search dialog for more complex searches.
|
307 |
searches.
|
|
|
308 |
|
292 |
|
309 |
After starting a search, a list of results will instantly be
|
293 |
After starting a search, a list of results will instantly be displayed in
|
310 |
displayed in the main list window. Clicking on the Preview link
|
294 |
the main list window. Clicking on the Preview link for an entry will open
|
311 |
for an entry will open an internal preview window for the
|
295 |
an internal preview window for the document. Clicking the Edit link will
|
312 |
document. Clicking the Edit link will attempt to start an external
|
296 |
attempt to start an external viewer (have a look at the mimeconf
|
313 |
viewer (have a look at the mimeconf configuration file to see how
|
297 |
configuration file to see how these are configured).
|
314 |
these are configured).
|
|
|
315 |
|
298 |
|
316 |
By default, the document list is presented in order of relevance
|
299 |
By default, the document list is presented in order of relevance (how well
|
317 |
(how well the system estimates that the document matches the
|
300 |
the system estimates that the document matches the query). You can specify
|
318 |
query). You can specify a different ordering by using the Tools /
|
301 |
a different ordering by using the Tools / Sort parameters dialog.
|
319 |
Sort parameters dialog.
|
|
|
320 |
|
302 |
|
321 |
The Preview and Edit edit links may not be present for all
|
303 |
The Preview and Edit edit links may not be present for all entries,
|
322 |
entries, meaning that Recoll has no configured way to preview a
|
304 |
meaning that Recoll has no configured way to preview a given file type
|
323 |
given file type (which was indexed by name only), or no configured
|
305 |
(which was indexed by name only), or no configured external viewer for the
|
324 |
external viewer for the file type. This can sometimes be adjusted
|
306 |
file type. This can sometimes be adjusted simply by tweaking the mimemap
|
325 |
simply by tweaking the mimemap and mimeconf configuration files.
|
307 |
and mimeconf configuration files.
|
326 |
|
308 |
|
327 |
You can click on the Query details link at the top of the results
|
309 |
You can click on the Query details link at the top of the results page to
|
328 |
page to see the query actually performed, after stem expansion and
|
310 |
see the query actually performed, after stem expansion and other
|
329 |
other processing.
|
311 |
processing.
|
330 |
|
312 |
|
331 |
--------------------------------------------------------------
|
313 |
----------------------------------------------------------------------
|
332 |
|
314 |
|
333 |
3.2. Complex/advanced search
|
315 |
3.2. Complex/advanced search
|
334 |
|
316 |
|
335 |
The advanced search dialog has fields that will allow a more
|
317 |
The advanced search dialog has fields that will allow a more refined
|
336 |
refined search, looking for documents with all given words, a
|
318 |
search, looking for documents with all given words, a given exact phrase,
|
337 |
given exact phrase, none of the given words, or a given file name
|
319 |
none of the given words, or a given file name (with wildcard expansion).
|
338 |
(with wildcard expansion). All relevant fields will be combined by
|
320 |
All relevant fields will be combined by an implicit AND clause.
|
339 |
an implicit AND clause.
|
|
|
340 |
|
321 |
|
341 |
It will let you search for documents of specific mime types (ie:
|
322 |
It will let you search for documents of specific mime types (ie: only
|
342 |
only text/plain, or text/html or application/pdf etc...)
|
323 |
text/plain, or text/html or application/pdf etc...)
|
343 |
|
324 |
|
344 |
It will let you restrict the search results to a subtree of the
|
325 |
It will let you restrict the search results to a subtree of the indexed
|
345 |
indexed area.
|
326 |
area.
|
346 |
|
327 |
|
347 |
Click on the Start Search button in the advanced search dialog to
|
328 |
Click on the Start Search button in the advanced search dialog to start
|
348 |
start the search. The button in the main window always performs a
|
329 |
the search. The button in the main window always performs a simple search.
|
349 |
simple search.
|
|
|
350 |
|
330 |
|
351 |
Click on the Show query details link at the top of the result page
|
331 |
Click on the Show query details link at the top of the result page to see
|
352 |
to see the query expansion.
|
332 |
the query expansion.
|
353 |
|
333 |
|
354 |
--------------------------------------------------------------
|
334 |
----------------------------------------------------------------------
|
355 |
|
335 |
|
356 |
3.3. Document history
|
336 |
3.3. Document history
|
357 |
|
337 |
|
358 |
Documents that you actually view (with the internal preview or an
|
338 |
Documents that you actually view (with the internal preview or an external
|
359 |
external tool) are entered into the document history, which is
|
339 |
tool) are entered into the document history, which is remembered. You can
|
360 |
remembered. You can display the history list by using the
|
340 |
display the history list by using the Tools/Doc History menu entry.
|
361 |
Tools/Doc History menu entry.
|
|
|
362 |
|
341 |
|
363 |
--------------------------------------------------------------
|
342 |
----------------------------------------------------------------------
|
364 |
|
343 |
|
365 |
3.4. Result list sorting
|
344 |
3.4. Result list sorting
|
366 |
|
345 |
|
367 |
The documents in a result list are normally sorted in order of
|
346 |
The documents in a result list are normally sorted in order of relevance.
|
368 |
relevance. It is possible to specify different sort parameters by
|
347 |
It is possible to specify different sort parameters by using the Sort
|
369 |
using the Sort parameters dialog (located in the Tools menu).
|
348 |
parameters dialog (located in the Tools menu).
|
370 |
|
349 |
|
371 |
The tool sorts a specified number of the most relevant documents
|
350 |
The tool sorts a specified number of the most relevant documents in the
|
372 |
in the result list, according to specified criteria. The currently
|
351 |
result list, according to specified criteria. The currently available
|
373 |
available criteria are date and mime type.
|
352 |
criteria are date and mime type.
|
374 |
|
353 |
|
375 |
The sort parameters stay in effect until they are explicitely
|
354 |
The sort parameters stay in effect until they are explicitely reset, or
|
376 |
reset, or the program exits. An activated sort is indicated in the
|
355 |
the program exits. An activated sort is indicated in the result list
|
377 |
result list header.
|
356 |
header.
|
378 |
|
357 |
|
379 |
--------------------------------------------------------------
|
358 |
----------------------------------------------------------------------
|
380 |
|
359 |
|
381 |
3.5. Search tips, shortcuts
|
360 |
3.5. Search tips, shortcuts
|
382 |
|
361 |
|
383 |
Disabling stem expansion. Entering a capitalized word in any
|
362 |
Disabling stem expansion. Entering a capitalized word in any search field
|
384 |
search field will prevent stem expansion (no search for gardening
|
363 |
will prevent stem expansion (no search for gardening if you enter Garden
|
385 |
if you enter Garden instead of garden). This is the only case
|
364 |
instead of garden). This is the only case where character case should make
|
386 |
where character case should make a difference for a Recoll search.
|
365 |
a difference for a Recoll search.
|
387 |
|
366 |
|
388 |
Phrases. A phrase can be looked for by enclosing it in double
|
367 |
Phrases. A phrase can be looked for by enclosing it in double quotes.
|
389 |
quotes. Example: "user manual" will look only for occurrences of
|
368 |
Example: "user manual" will look only for occurrences of user immediately
|
390 |
user immediately followed by manual. You can use the This exact
|
369 |
followed by manual. You can use the This exact phrase field of the
|
391 |
phrase field of the advanced search dialog to the same effect.
|
370 |
advanced search dialog to the same effect.
|
392 |
|
371 |
|
393 |
Query explanation. You can get an exact description of what the
|
372 |
Query explanation. You can get an exact description of what the query
|
394 |
query looked for, including stem expansion, and boolean operators
|
373 |
looked for, including stem expansion, and boolean operators used, by
|
395 |
used, by clicking on the result list header.
|
374 |
clicking on the result list header.
|
396 |
|
375 |
|
397 |
File names. All file name elements (the broken up file path) are
|
376 |
File names. All file name elements (the broken up file path) are entered
|
398 |
entered as terms during indexation, and you can specify them as
|
377 |
as terms during indexation, and you can specify them as ordinary terms in
|
399 |
ordinary terms in normal search fields. Alternatively, you can use
|
378 |
normal search fields. Alternatively, you can use specific file name search
|
400 |
specific file name search which will only look for file names and
|
379 |
which will only look for file names and can use wildcard expansion.
|
401 |
can use wildcard expansion.
|
|
|
402 |
|
380 |
|
403 |
Quitting. Entering ^Q almost anywhere will close the application.
|
381 |
Quitting. Entering ^Q almost anywhere will close the application.
|
404 |
|
382 |
|
405 |
Closing previews. Entering ^W in a preview tab will close it (and,
|
383 |
Closing previews. Entering ^W in a preview tab will close it (and, for the
|
406 |
for the last tab, close the preview window).
|
384 |
last tab, close the preview window).
|
407 |
|
385 |
|
408 |
--------------------------------------------------------------
|
386 |
----------------------------------------------------------------------
|
409 |
|
387 |
|
410 |
3.6. Customising the search interface
|
388 |
3.6. Customising the search interface
|
411 |
|
389 |
|
412 |
It is possible to customise some aspects of the search interface
|
390 |
It is possible to customise some aspects of the search interface by using
|
413 |
by using Query configuration entry in the Preferences menu.
|
391 |
Query configuration entry in the Preferences menu.
|
414 |
|
392 |
|
415 |
There are two tabs in the dialog, dealing with the interface
|
393 |
There are two tabs in the dialog, dealing with the interface itself, and
|
416 |
itself, and with the parameters used for searching and returning
|
394 |
with the parameters used for searching and returning results.
|
417 |
results.
|
|
|
418 |
|
395 |
|
419 |
User interface parameters:
|
396 |
User interface parameters:
|
420 |
|
397 |
|
421 |
* Number of results in a result page
|
398 |
* Number of results in a result page
|
422 |
|
399 |
|
423 |
* Result list font: There is quite a lot of information shown in
|
400 |
* Result list font: There is quite a lot of information shown in the
|
424 |
the result list, and you may want to customise the font and/or
|
401 |
result list, and you may want to customise the font and/or font size.
|
425 |
font size. The rest of the fonts used by Recoll are determined
|
402 |
The rest of the fonts used by Recoll are determined by your generic QT
|
426 |
by your generic QT config (try the qtconfig command.
|
403 |
config (try the qtconfig command.
|
427 |
|
404 |
|
428 |
* Html help browser: this will let you chose your the preferred
|
405 |
* Html help browser: this will let you chose your the preferred browser
|
429 |
browser which will be started from the Help menu to read the
|
406 |
which will be started from the Help menu to read the user manual. You
|
430 |
user manual. You can enter a simple name if the command is in
|
407 |
can enter a simple name if the command is in your PATH, or browse for
|
431 |
your PATH, or browse for a full pathname.
|
408 |
a full pathname.
|
432 |
|
409 |
|
433 |
* Show document type icons in result list: icons in the result
|
410 |
* Show document type icons in result list: icons in the result list can
|
434 |
list can be turned off. They take quite a lot of space and
|
411 |
be turned off. They take quite a lot of space and convey relatively
|
435 |
convey relatively little useful information.
|
412 |
little useful information.
|
436 |
|
413 |
|
437 |
Search parameters:
|
414 |
Search parameters:
|
438 |
|
415 |
|
439 |
* Stemming language: stemming obviously depends on the
|
416 |
* Stemming language: stemming obviously depends on the document's
|
440 |
document's language. This listbox will let you chose among the
|
417 |
language. This listbox will let you chose among the stemming databases
|
441 |
stemming databases which were built during indexing (this is
|
418 |
which were built during indexing (this is set in the main
|
442 |
set in the main configuration file), or later added with
|
419 |
configuration file), or later added with recollindex -s (See the
|
443 |
recollindex -s (See the recollindex manual). Stemming
|
420 |
recollindex manual). Stemming languages which are dynamically added
|
444 |
languages which are dynamically added will be deleted at the
|
|
|
445 |
next indexation pass unless they are also added in the
|
421 |
will be deleted at the next indexation pass unless they are also added
|
446 |
configuration file.
|
422 |
in the configuration file.
|
447 |
|
423 |
|
448 |
* Dynamically build abstracts: this decides if Recoll tries to
|
424 |
* Dynamically build abstracts: this decides if Recoll tries to build
|
449 |
build document abstracts when displaying the result list.
|
425 |
document abstracts when displaying the result list. Abstracts are
|
450 |
Abstracts are constructed by taking context from the document
|
426 |
constructed by taking context from the document information, around
|
451 |
information, around the search terms. This can slow down
|
427 |
the search terms. This can slow down result list display significantly
|
452 |
result list display significantly for big documents, and you
|
428 |
for big documents, and you may want to turn it off.
|
453 |
may want to turn it off.
|
|
|
454 |
|
429 |
|
455 |
* Replace abstracts from documents: this decides if we should
|
430 |
* Replace abstracts from documents: this decides if we should synthetize
|
456 |
synthetize and display an abstract in place of an explicit
|
431 |
and display an abstract in place of an explicit abstract found within
|
457 |
abstract found within the document itself.
|
432 |
the document itself.
|
458 |
|
433 |
|
459 |
--------------------------------------------------------------
|
434 |
----------------------------------------------------------------------
|
460 |
|
435 |
|
461 |
Chapter 4. Installation
|
436 |
Chapter 4. Installation
|
462 |
|
437 |
|
463 |
4.1. Building from source
|
438 |
4.1. Building from source
|
464 |
|
439 |
|
465 |
4.1.1. Prerequisites
|
440 |
4.1.1. Prerequisites
|
466 |
|
441 |
|
467 |
At the very least, you will need to download and install the
|
442 |
At the very least, you will need to download and install the xapian core
|
468 |
xapian core package (Recoll currently uses version 0.9.2), and the
|
443 |
package (Recoll currently uses version 0.9.2), and the qt runtime and
|
469 |
qt runtime and development packages (Recoll development currently
|
444 |
development packages (Recoll development currently uses version 3.3.5, but
|
470 |
uses version 3.3.5, but any 3.3 version is probably ok).
|
445 |
any 3.3 version is probably ok).
|
471 |
|
446 |
|
472 |
You will most probably be able to find a binary package for qt for
|
447 |
You will most probably be able to find a binary package for qt for your
|
473 |
your system. You may have to compile Xapian but this is not
|
448 |
system. You may have to compile Xapian but this is not difficult (if you
|
474 |
difficult (if you are using FreeBSD, there is a port).
|
449 |
are using FreeBSD, there is a port).
|
475 |
|
450 |
|
476 |
You may also need libiconv. Recoll currently uses version 1.9
|
451 |
You may also need libiconv. Recoll currently uses version 1.9 (this should
|
477 |
(this should not be critical). On Linux systems, the iconv
|
452 |
not be critical). On Linux systems, the iconv interface is part of libc
|
478 |
interface is part of libc and you should not need to do anything
|
453 |
and you should not need to do anything special.
|
479 |
special.
|
|
|
480 |
|
454 |
|
481 |
External file types. Recoll uses external applications to index
|
455 |
External file types. Recoll uses external applications to index some file
|
482 |
some file types. You need to install them for the file types that
|
456 |
types. You need to install them for the file types that you wish to have
|
483 |
you wish to have indexed (these are run-time dependencies. None is
|
457 |
indexed (these are run-time dependencies. None is needed for building
|
484 |
needed for building Recoll):
|
458 |
Recoll):
|
485 |
|
459 |
|
486 |
* PDF: pdftotext is part of the Xpdf package.
|
460 |
* PDF: pdftotext is part of the Xpdf package.
|
487 |
|
461 |
|
488 |
* Postscript: pstotext.
|
462 |
* Postscript: pstotext.
|
489 |
|
463 |
|
|
... |
|
... |
493 |
|
467 |
|
494 |
* dvi: dvips
|
468 |
* dvi: dvips
|
495 |
|
469 |
|
496 |
* djvu: DjVuLibre
|
470 |
* djvu: DjVuLibre
|
497 |
|
471 |
|
498 |
* MP3: Recoll will use the id3info command from the id3lib
|
472 |
* MP3: Recoll will use the id3info command from the id3lib package to
|
499 |
package to extract tag information. Without it, only the
|
473 |
extract tag information. Without it, only the filenames will be
|
500 |
filenames will be indexed.
|
474 |
indexed.
|
501 |
|
475 |
|
502 |
Text, Html, mail folders and Openoffice files are processed
|
476 |
Text, Html, mail folders and Openoffice files are processed internally.
|
503 |
internally.
|
|
|
504 |
|
477 |
|
505 |
--------------------------------------------------------------
|
478 |
----------------------------------------------------------------------
|
506 |
|
479 |
|
507 |
4.1.2. Building
|
480 |
4.1.2. Building
|
508 |
|
481 |
|
509 |
Recoll has been built on Linux (redhat7.3, mandriva 2005, Fedora
|
482 |
Recoll has been built on Linux (redhat7.3, mandriva 2005, Fedora Core 3),
|
510 |
Core 3), FreeBSD and Solaris 8. If you build on another system, I
|
483 |
FreeBSD and Solaris 8. If you build on another system, I would very much
|
511 |
would very much welcome patches.
|
484 |
welcome patches.
|
512 |
|
485 |
|
513 |
Depending on the qt configuration on your system, you may have to
|
486 |
Depending on the qt configuration on your system, you may have to set the
|
514 |
set the QTDIR and QMAKESPECS variables in your environment:
|
487 |
QTDIR and QMAKESPECS variables in your environment:
|
515 |
|
488 |
|
516 |
* QTDIR should point to the directory above the one that holds
|
489 |
* QTDIR should point to the directory above the one that holds the qt
|
517 |
the qt include files (ie: qt.h).
|
490 |
include files (ie: qt.h).
|
518 |
|
491 |
|
519 |
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
492 |
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
520 |
subdirectories (ie: linux-g++).
|
493 |
subdirectories (ie: linux-g++).
|
521 |
|
494 |
|
522 |
On many Linux systems, QTDIR is set by the login scripts, and
|
495 |
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
523 |
QMAKESPECS is not needed because there is a default link in
|
496 |
is not needed because there is a default link in mkspecs/.
|
524 |
mkspecs/.
|
|
|
525 |
|
497 |
|
526 |
The Recoll configure script does a better job of checking these
|
498 |
The Recoll configure script does a better job of checking these variables
|
527 |
variables after release 1.1.1. Before this, unexplained errors
|
499 |
after release 1.1.1. Before this, unexplained errors will occur during
|
528 |
will occur during compilation if the environment is not set up.
|
500 |
compilation if the environment is not set up. Also, for 1.1.0 the qmake
|
529 |
Also, for 1.1.0 the qmake command should be in your PATH (later
|
501 |
command should be in your PATH (later releases can also find it in
|
530 |
releases can also find it in $QTDIR/bin).
|
502 |
$QTDIR/bin).
|
531 |
|
503 |
|
532 |
Normal procedure:
|
504 |
Normal procedure:
|
533 |
|
505 |
|
534 |
cd recoll-xxx
|
506 |
cd recoll-xxx
|
535 |
configure
|
507 |
configure
|
536 |
make
|
508 |
make
|
537 |
(practises usual hardship-repelling invocations)
|
509 |
(practises usual hardship-repelling invocations)
|
538 |
|
510 |
|
539 |
|
511 |
|
540 |
There little autoconfiguration. The configure script will mainly
|
512 |
There little autoconfiguration. The configure script will mainly link one
|
541 |
link one of the system-specific files in the mk directory to
|
513 |
of the system-specific files in the mk directory to mk/sysconf. If your
|
542 |
mk/sysconf. If your system is not known yet, it will tell you as
|
514 |
system is not known yet, it will tell you as much, and you may want to
|
543 |
much, and you may want to manually copy and modify one of the
|
515 |
manually copy and modify one of the existing files (the new file name
|
544 |
existing files (the new file name should be the output of uname
|
516 |
should be the output of uname -s).
|
545 |
-s).
|
|
|
546 |
|
517 |
|
547 |
--------------------------------------------------------------
|
518 |
----------------------------------------------------------------------
|
548 |
|
519 |
|
549 |
4.1.3. Installation
|
520 |
4.1.3. Installation
|
550 |
|
521 |
|
551 |
Either type make install or execute recollinstall prefix, in the
|
522 |
Either type make install or execute recollinstall prefix, in the root of
|
552 |
root of the source tree. This will copy the commands to prefix/bin
|
523 |
the source tree. This will copy the commands to prefix/bin and the sample
|
553 |
and the sample configuration files, scripts and other shared data
|
524 |
configuration files, scripts and other shared data to prefix/share/recoll.
|
554 |
to prefix/share/recoll.
|
|
|
555 |
|
525 |
|
556 |
You can then proceed to configuration.
|
526 |
You can then proceed to configuration.
|
557 |
|
527 |
|
558 |
--------------------------------------------------------------
|
528 |
----------------------------------------------------------------------
|
559 |
|
529 |
|
560 |
4.2. Installing a prebuilt copy
|
530 |
4.2. Installing a prebuilt copy
|
561 |
|
531 |
|
562 |
4.2.1. Installing through a package system
|
532 |
4.2.1. Installing through a package system
|
563 |
|
533 |
|
564 |
If you are lucky enough to be using a port system or a prebuilt
|
534 |
If you are lucky enough to be using a port system or a prebuilt package
|
565 |
package (RPM or other), just follow the usual procedure, and have
|
535 |
(RPM or other), just follow the usual procedure, and have a look at the
|
566 |
a look at the configuration section.
|
536 |
configuration section.
|
567 |
|
537 |
|
568 |
--------------------------------------------------------------
|
538 |
----------------------------------------------------------------------
|
569 |
|
539 |
|
570 |
4.2.2. Installing a prebuilt Recoll
|
540 |
4.2.2. Installing a prebuilt Recoll
|
571 |
|
541 |
|
572 |
The unpackaged binary versions are just compressed tar files of a
|
542 |
The unpackaged binary versions are just compressed tar files of a build
|
573 |
build tree, where only the useful parts were kept (executables and
|
543 |
tree, where only the useful parts were kept (executables and sample
|
574 |
sample configuration).
|
544 |
configuration).
|
575 |
|
545 |
|
576 |
The executable binary files are built with a static link to
|
546 |
The executable binary files are built with a static link to libxapian and
|
577 |
libxapian and libiconv, to make installation easier (no
|
547 |
libiconv, to make installation easier (no dependencies). However, this
|
578 |
dependencies). However, this also means that you cannot change the
|
548 |
also means that you cannot change the versions which are used.
|
579 |
versions which are used.
|
|
|
580 |
|
549 |
|
581 |
After extracting the tar file, you can proceed with installation
|
550 |
After extracting the tar file, you can proceed with installation as if you
|
582 |
as if you had built the package from source.
|
551 |
had built the package from source.
|
583 |
|
552 |
|
584 |
--------------------------------------------------------------
|
553 |
----------------------------------------------------------------------
|
585 |
|
554 |
|
586 |
4.3. Configuration overview
|
555 |
4.3. Configuration overview
|
587 |
|
556 |
|
588 |
There are two sets of configuration files. The system-wide files
|
557 |
There are two sets of configuration files. The system-wide files are kept
|
589 |
are kept in a directory named like
|
558 |
in a directory named like /usr/[local/]share/recoll/examples, they define
|
590 |
/usr/[local/]share/recoll/examples, they define default values for
|
|
|
591 |
the system. A parallel set of files exists in the .recoll
|
559 |
default values for the system. A parallel set of files exists in the
|
592 |
directory in your home (this can be changed with the
|
560 |
.recoll directory in your home (this can be changed with the
|
593 |
RECOLL_CONFDIR environment variable. The database is also kept in
|
561 |
RECOLL_CONFDIR environment variable. The database is also kept in .recoll
|
594 |
.recoll by default, (this can be changed by a configuration
|
562 |
by default, (this can be changed by a configuration parameter).
|
595 |
parameter).
|
|
|
596 |
|
563 |
|
597 |
If the .recoll directory does not exist when recoll or recollindex
|
564 |
If the .recoll directory does not exist when recoll or recollindex are
|
598 |
are started, it will be created with a set of empty configuration
|
565 |
started, it will be created with a set of empty configuration files.
|
599 |
files. recoll will give you a chance to edit the configuration
|
566 |
recoll will give you a chance to edit the configuration file before
|
600 |
file before starting indexation. recollindex will proceed
|
567 |
starting indexation. recollindex will proceed immediately.
|
601 |
immediately.
|
|
|
602 |
|
568 |
|
603 |
Most of the parameters specific to the recoll GUI are set through
|
569 |
Most of the parameters specific to the recoll GUI are set through the
|
604 |
the Preferences menu and stored in the standard QT place
|
570 |
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
605 |
($HOME/.qt/recollrc). You probably do not want to edit this by
|
571 |
You probably do not want to edit this by hand.
|
606 |
hand.
|
|
|
607 |
|
572 |
|
608 |
For other options, Recoll uses text configuration files. You will
|
573 |
For other options, Recoll uses text configuration files. You will have to
|
609 |
have to edit them by hand for now (there is still some hope for a
|
574 |
edit them by hand for now (there is still some hope for a GUI
|
610 |
GUI configuration tool in the future). The most accurate
|
575 |
configuration tool in the future). The most accurate documentation for the
|
611 |
documentation for the configuration parameters is given by
|
576 |
configuration parameters is given by comments inside the default files,
|
612 |
comments inside the default files, and we will just give a general
|
577 |
and we will just give a general overview here.
|
613 |
overview here.
|
|
|
614 |
|
578 |
|
615 |
All configuration files share the same format. For exemple, a
|
579 |
All configuration files share the same format. For exemple, a short
|
616 |
short extract of the main configuration file might look as
|
580 |
extract of the main configuration file might look as follows:
|
617 |
follows:
|
|
|
618 |
|
581 |
|
619 |
# Space-separated list of directories to index.
|
582 |
# Space-separated list of directories to index.
|
620 |
topdirs = ~/docs /usr/share/doc
|
583 |
topdirs = ~/docs /usr/share/doc
|
621 |
|
584 |
|
622 |
[~/somedirectory-with-utf8-txt-files]
|
585 |
[~/somedirectory-with-utf8-txt-files]
|
623 |
defaultcharset = utf-8
|
586 |
defaultcharset = utf-8
|
624 |
|
587 |
|
625 |
|
588 |
|
626 |
There are three kinds of lines:
|
589 |
There are three kinds of lines:
|
627 |
|
590 |
|
628 |
* Comment (starts with #) or empty.
|
591 |
* Comment (starts with #) or empty.
|
629 |
|
592 |
|
630 |
* Parameter affectation (name = value).
|
593 |
* Parameter affectation (name = value).
|
631 |
|
594 |
|
632 |
* Section definition ([somedirname]).
|
595 |
* Section definition ([somedirname]).
|
633 |
|
596 |
|
634 |
Section lines allow redefining some parameters for a directory
|
597 |
Section lines allow redefining some parameters for a directory subtree.
|
635 |
subtree. Some of the parameters used for indexation are looked up
|
598 |
Some of the parameters used for indexation are looked up hierarchically
|
636 |
hierarchically from the more to the less specific. Not all
|
599 |
from the more to the less specific. Not all parameters can be meaningfully
|
637 |
parameters can be meaningfully redefined, this is specified for
|
600 |
redefined, this is specified for each in the next section.
|
638 |
each in the next section.
|
|
|
639 |
|
601 |
|
640 |
The tilde character (~) is expanded in file names to the name of
|
602 |
The tilde character (~) is expanded in file names to the name of the
|
641 |
the user's home directory.
|
603 |
user's home directory.
|
642 |
|
604 |
|
643 |
White space is used for separation inside lists. Elements with
|
605 |
White space is used for separation inside lists. Elements with embedded
|
644 |
embedded spaces can be quoted using double-quotes.
|
606 |
spaces can be quoted using double-quotes.
|
645 |
|
607 |
|
646 |
--------------------------------------------------------------
|
608 |
----------------------------------------------------------------------
|
647 |
|
609 |
|
648 |
4.3.1. Main configuration file
|
610 |
4.3.1. Main configuration file
|
649 |
|
611 |
|
650 |
recoll.conf is the main configuration file. It defines things like
|
612 |
recoll.conf is the main configuration file. It defines things like what to
|
651 |
what to index (top directories and things to ignore), and the
|
613 |
index (top directories and things to ignore), and the default character
|
652 |
default character set to use for document types which do not
|
614 |
set to use for document types which do not specify it internally.
|
653 |
specify it internally.
|
|
|
654 |
|
615 |
|
655 |
The default configuration will index your home directory. If this
|
616 |
The default configuration will index your home directory. If this is not
|
656 |
is not appropriate, use recoll to copy the sample configuration,
|
617 |
appropriate, use recoll to copy the sample configuration, click Cancel,
|
657 |
click Cancel, and edit the configuration file before restarting
|
618 |
and edit the configuration file before restarting the command. This will
|
658 |
the command. This will start the initial indexation, which may
|
619 |
start the initial indexation, which may take some time.
|
659 |
take some time.
|
|
|
660 |
|
620 |
|
661 |
Paramers:
|
621 |
Paramers:
|
662 |
|
622 |
|
663 |
topdirs
|
623 |
topdirs
|
664 |
|
624 |
|
665 |
Specifies the list of directories to index (recursively).
|
625 |
Specifies the list of directories to index (recursively).
|
666 |
|
626 |
|
667 |
skippedNames
|
627 |
skippedNames
|
668 |
|
628 |
|
669 |
A space-separated list of patterns for names of files or
|
629 |
A space-separated list of patterns for names of files or
|
670 |
directories that should be completely ignored. The list
|
630 |
directories that should be completely ignored. The list defined in
|
671 |
defined in the default file is:
|
631 |
the default file is:
|
672 |
|
632 |
|
673 |
*~ #* bin CVS Cache caughtspam tmp
|
633 |
*~ #* bin CVS Cache caughtspam tmp
|
674 |
|
634 |
|
675 |
The list can be redefined for subdirectories, but is only
|
635 |
The list can be redefined for subdirectories, but is only actually
|
676 |
actually changed for the top level ones in topdirs.
|
636 |
changed for the top level ones in topdirs.
|
677 |
|
637 |
|
678 |
The top-level directories are not affected by this list
|
638 |
The top-level directories are not affected by this list (that is,
|
679 |
(that is, a directory in topdirs might match and would
|
639 |
a directory in topdirs might match and would still be indexed).
|
680 |
still be indexed).
|
|
|
681 |
|
640 |
|
682 |
The list in the default configuration does not exclude
|
641 |
The list in the default configuration does not exclude hidden
|
683 |
hidden directories (names beginning with a dot), which
|
642 |
directories (names beginning with a dot), which means that it may
|
684 |
means that it may index quite a few things that you do not
|
643 |
index quite a few things that you do not want. On the other hand,
|
685 |
want. On the other hand, mail user agents like thunderbird
|
644 |
mail user agents like thunderbird usually store messages in hidden
|
686 |
usually store messages in hidden directories, and you
|
645 |
directories, and you probably want this indexed. One possible
|
687 |
probably want this indexed. One possible solution is to
|
|
|
688 |
have .* in skippedNames, and add things like
|
646 |
solution is to have .* in skippedNames, and add things like
|
689 |
~/.thunderbird or ~/.evolution in topdirs.
|
647 |
~/.thunderbird or ~/.evolution in topdirs.
|
690 |
|
648 |
|
691 |
loglevel
|
649 |
loglevel
|
692 |
|
650 |
|
693 |
Verbosity level for recoll and recollindex. A value of 4
|
651 |
Verbosity level for recoll and recollindex. A value of 4 lists
|
694 |
lists quite a lot of debug/information messages. 2 only
|
652 |
quite a lot of debug/information messages. 2 only lists errors.
|
695 |
lists errors.
|
|
|
696 |
|
653 |
|
697 |
logfilename
|
654 |
logfilename
|
698 |
|
655 |
|
699 |
Where should the messages go. 'stderr' can be used as a
|
656 |
Where should the messages go. 'stderr' can be used as a special
|
700 |
special value.
|
657 |
value.
|
701 |
|
658 |
|
702 |
filtersdir
|
659 |
filtersdir
|
703 |
|
660 |
|
704 |
A directory to search for the external filter scripts used
|
661 |
A directory to search for the external filter scripts used to
|
705 |
to index some types of files. The value should not be
|
662 |
index some types of files. The value should not be changed, except
|
706 |
changed, except if you want to modify one of the default
|
663 |
if you want to modify one of the default scripts. The value can be
|
707 |
scripts. The value can be redefined for any subdirectory.
|
664 |
redefined for any subdirectory.
|
708 |
|
665 |
|
709 |
indexstemminglanguages
|
666 |
indexstemminglanguages
|
710 |
|
667 |
|
711 |
A list of languages for which the stem expansion databases
|
668 |
A list of languages for which the stem expansion databases will be
|
712 |
will be built. See recollindex(1) for possible values. You
|
669 |
built. See recollindex(1) for possible values. You can add a stem
|
713 |
can add a stem expansion database for a different language
|
670 |
expansion database for a different language by using recollindex
|
714 |
by using recollindex -s, but it will be deleted during the
|
671 |
-s, but it will be deleted during the next indexation. Only
|
715 |
next indexation. Only languages listed in the
|
|
|
716 |
configuration file are permanent.
|
672 |
languages listed in the configuration file are permanent.
|
717 |
|
673 |
|
718 |
iconsdir
|
674 |
iconsdir
|
719 |
|
675 |
|
720 |
The name of the directory where recoll result list icons
|
676 |
The name of the directory where recoll result list icons are
|
721 |
are stored. You can change this if you want different
|
677 |
stored. You can change this if you want different images.
|
722 |
images.
|
|
|
723 |
|
678 |
|
724 |
dbdir
|
679 |
dbdir
|
725 |
|
680 |
|
726 |
The name of the Xapian database directory. It will be
|
681 |
The name of the Xapian database directory. It will be created if
|
727 |
created if needed when the database is initialized.
|
682 |
needed when the database is initialized.
|
728 |
|
683 |
|
729 |
defaultcharset
|
684 |
defaultcharset
|
730 |
|
685 |
|
731 |
The name of the character set used for files that do not
|
686 |
The name of the character set used for files that do not contain a
|
732 |
contain a character set definition (ie: plain text files).
|
687 |
character set definition (ie: plain text files). This can be
|
733 |
This can be redefined for any subdirectory. If it is not
|
688 |
redefined for any subdirectory. If it is not set at all, the
|
734 |
set at all, the character set used is the one defined by
|
689 |
character set used is the one defined by the nls environment
|
735 |
the nls environment (LC_ALL, LC_CTYPE, LANG), or iso8859-1
|
690 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
736 |
if nothing is set.
|
|
|
737 |
|
691 |
|
738 |
guesscharset
|
692 |
guesscharset
|
739 |
|
693 |
|
740 |
Decide if we try to guess the character set of files if no
|
694 |
Decide if we try to guess the character set of files if no
|
741 |
internal value is available (ie: for plain text files).
|
695 |
internal value is available (ie: for plain text files). This does
|
742 |
This does not work well in general, and should probably
|
696 |
not work well in general, and should probably not be used.
|
743 |
not be used.
|
|
|
744 |
|
697 |
|
745 |
usesystemfilecommand
|
698 |
usesystemfilecommand
|
746 |
|
699 |
|
747 |
Decide if we use the file -i system command as a final
|
700 |
Decide if we use the file -i system command as a final step for
|
748 |
step for determining the mime type for a file (the main
|
701 |
determining the mime type for a file (the main procedure uses
|
749 |
procedure uses suffix associations as defined in the
|
702 |
suffix associations as defined in the mimemap file). This can be
|
750 |
mimemap file). This can be useful for files with
|
|
|
751 |
suffixless names, but it will also cause the indexation of
|
703 |
useful for files with suffixless names, but it will also cause the
|
752 |
many bogus "text" files.
|
704 |
indexation of many bogus "text" files.
|
753 |
|
705 |
|
754 |
indexallfilenames
|
706 |
indexallfilenames
|
755 |
|
707 |
|
756 |
Recoll indexes file names in a special section of the
|
708 |
Recoll indexes file names in a special section of the database to
|
757 |
database to allow specific file names searches using wild
|
709 |
allow specific file names searches using wild cards. This
|
758 |
cards. This parameter decides if file name indexing is
|
710 |
parameter decides if file name indexing is performed only for
|
759 |
performed only for files with mime types that would
|
711 |
files with mime types that would qualify them for full text
|
760 |
qualify them for full text indexation, or for all files
|
712 |
indexation, or for all files inside the selected subtrees,
|
761 |
inside the selected subtrees, independant of mime type.
|
713 |
independant of mime type.
|
762 |
|
714 |
|
763 |
--------------------------------------------------------------
|
715 |
----------------------------------------------------------------------
|
764 |
|
716 |
|
765 |
4.3.2. The mimemap file
|
717 |
4.3.2. The mimemap file
|
766 |
|
718 |
|
767 |
mimemap specifies the file name extension to mime type mappings.
|
719 |
mimemap specifies the file name extension to mime type mappings.
|
768 |
|
720 |
|
769 |
For file names without an extension, or with an unknown one, the
|
721 |
For file names without an extension, or with an unknown one, the system's
|
770 |
system's file -i command will be executed to determine the mime
|
722 |
file -i command will be executed to determine the mime type (this can be
|
771 |
type (this can be switched off inside the main configuration
|
723 |
switched off inside the main configuration file).
|
772 |
file).
|
|
|
773 |
|
724 |
|
774 |
mimemap also has a list of extensions which should be ignored
|
725 |
mimemap also has a list of extensions which should be ignored totally (to
|
775 |
totally (to avoid losing time by executing file for things that
|
726 |
avoid losing time by executing file for things that certainly should not
|
776 |
certainly should not be indexed).
|
727 |
be indexed).
|
777 |
|
728 |
|
778 |
The mappings can be specified on a per-subtree basis, which may be
|
729 |
The mappings can be specified on a per-subtree basis, which may be useful
|
779 |
useful in some cases. Example: gaim logs have a .txt extension but
|
730 |
in some cases. Example: gaim logs have a .txt extension but should be
|
780 |
should be handled specially, which is possible because they are
|
731 |
handled specially, which is possible because they are usually all located
|
781 |
usually all located in one place.
|
732 |
in one place.
|
782 |
|
733 |
|
783 |
mimemap also has a recoll_noindex variable which is a list of
|
734 |
mimemap also has a recoll_noindex variable which is a list of suffixes.
|
784 |
suffixes. Matching files will be skipped (avoids unnecessary
|
735 |
Matching files will be skipped (avoids unnecessary decompressions or file
|
785 |
decompressions or file executions). This is partially redundant
|
736 |
executions). This is partially redundant with skippedNames in the main
|
786 |
with skippedNames in the main configuration file, with two
|
737 |
configuration file, with two differences: it will not affect directories,
|
787 |
differences: it will not affect directories, and it can be changed
|
738 |
and it can be changed for any subdirectory.
|
788 |
for any subdirectory.
|
|
|
789 |
|
739 |
|
790 |
--------------------------------------------------------------
|
740 |
----------------------------------------------------------------------
|
791 |
|
741 |
|
792 |
4.3.3. The mimeconf file
|
742 |
4.3.3. The mimeconf file
|
793 |
|
743 |
|
794 |
mimeconf specifies how the different mime types are handled for
|
744 |
mimeconf specifies how the different mime types are handled for
|
795 |
indexation, and for display.
|
745 |
indexation, and for display.
|
796 |
|
746 |
|
797 |
Changing the indexation parameters is probably not a good idea
|
747 |
Changing the indexation parameters is probably not a good idea except if
|
798 |
except if you are a Recoll developper.
|
748 |
you are a Recoll developper.
|
799 |
|
749 |
|
800 |
You may want to adjust the external viewers defined in (ie: html
|
750 |
You may want to adjust the external viewers defined in (ie: html is either
|
801 |
is either previewed internally or displayed using firefox, but you
|
751 |
previewed internally or displayed using firefox, but you may prefer
|
802 |
may prefer mozilla, your openoffice.org program might be named
|
752 |
mozilla, your openoffice.org program might be named oofice instead of
|
803 |
oofice instead of openoffice ...). Look for the [view] section.
|
753 |
openoffice ...). Look for the [view] section.
|
804 |
|
754 |
|
805 |
You can also change the icons which are displayed by recoll in the
|
755 |
You can also change the icons which are displayed by recoll in the result
|
806 |
result lists (the values are the basenames of the png images
|
756 |
lists (the values are the basenames of the png images inside the iconsdir
|
807 |
inside the iconsdir directory (specified in recoll.conf).
|
757 |
directory (specified in recoll.conf).
|
808 |
|
758 |
|
809 |
--------------------------------------------------------------
|
759 |
----------------------------------------------------------------------
|