|
a/src/README |
|
b/src/README |
|
... |
|
... |
233 |
|
233 |
|
234 |
Recoll stores all internal data in Unicode UTF-8 format, and it can index
|
234 |
Recoll stores all internal data in Unicode UTF-8 format, and it can index
|
235 |
files with different character sets, encodings, and languages into the
|
235 |
files with different character sets, encodings, and languages into the
|
236 |
same index. It has input filters for many document types.
|
236 |
same index. It has input filters for many document types.
|
237 |
|
237 |
|
238 |
Stemming depends on the document language. Recoll stores the unstemmed
|
238 |
Stemming is the process by which Recoll reduces words to their radicals so
|
239 |
versions of terms and uses auxiliary databases for term expansion. It can
|
239 |
that searching does not depend, for example, on a word being singular or
|
240 |
switch stemming languages, or add a language, without re-indexing. Storing
|
240 |
plural (floor, floors), or on a verb tense (flooring, floored). Because
|
|
|
241 |
the mechanisms used for stemming depend on the specific grammatical rules
|
|
|
242 |
for each language, there is a separate stemmer module for most common
|
|
|
243 |
languages where stemming makes sense. Storing documents written in
|
241 |
documents in different languages in the same index is possible, and useful
|
244 |
different languages in the same index is possible, and commonly done. In
|
242 |
in practice, but does introduce possibilities of confusion. Recoll
|
245 |
this situation, you can specify several stemming languages for the index.
|
243 |
currently makes no attempt at automatic language recognition.
|
246 |
Recoll stores the unstemmed versions of terms in the main index and uses
|
|
|
247 |
auxiliary databases for term expansion (one for each stemming language),
|
|
|
248 |
which means that you can switch stemming languages between searches, or
|
|
|
249 |
add a language without needing a full reindex. Recoll currently makes no
|
|
|
250 |
attempt at automatic language recognition, which means that the stemmer
|
|
|
251 |
will sometimes be applied to terms from other languages with potentially
|
|
|
252 |
strange results. In practise, even if this introduces possibilities of
|
|
|
253 |
confusion, this approach has been proven quite useful, and, awaiting the
|
|
|
254 |
addition of an automatic language recognition module to Recoll, it is much
|
|
|
255 |
less cumbersome than separating your documents according to what language
|
|
|
256 |
they are written in.
|
244 |
|
257 |
|
245 |
Recoll has many parameters which define exactly what to index, and how to
|
258 |
Recoll has many parameters which define exactly what to index, and how to
|
246 |
classify and decode the source documents. These are kept in configuration
|
259 |
classify and decode the source documents. These are kept in configuration
|
247 |
files. A default configuration is copied into a standard location (usually
|
260 |
files. A default configuration is copied into a standard location (usually
|
248 |
something like /usr/[local/]share/recoll/examples) during installation.
|
261 |
something like /usr/[local/]share/recoll/examples) during installation.
|
249 |
The default parameters from this file may be overridden by values that you
|
262 |
The default values set by the configuration files in this directory may be
|
250 |
set inside your personal configuration, found by default in the .recoll
|
263 |
overridden by values that you set inside your personal configuration,
|
251 |
sub-directory of your home directory. The default configuration will index
|
264 |
found by default in the .recoll sub-directory of your home directory. The
|
252 |
your home directory with default parameters and should be sufficient for
|
265 |
default configuration will index your home directory with default
|
253 |
giving Recoll a try, but you may want to adjust it later, which can be
|
266 |
parameters and should be sufficient for giving Recoll a try, but you may
|
254 |
done either by editing the text files or by using configuration menus in
|
267 |
want to adjust it later, which can be done either by editing the text
|
255 |
the recoll GUI
|
268 |
files or by using configuration menus in the recoll GUI
|
256 |
|
269 |
|
257 |
Indexing is started automatically the first time you execute the recoll
|
270 |
Indexing is started automatically the first time you execute the recoll
|
258 |
search graphical user interface, or by executing the recollindex command.
|
271 |
search graphical user interface, or by executing the recollindex command.
|
259 |
|
272 |
|
260 |
Searches are usually performed inside the recoll graphical user interface
|
273 |
Searches are usually performed inside the recoll graphical user interface
|