Switch to unified view

a/src/README b/src/README
...
...
38
                2.4. Using cron to automate indexation
38
                2.4. Using cron to automate indexation
39
39
40
   3. Search
40
   3. Search
41
41
42
                3.1. Simple search
42
                3.1. Simple search
43
44
                             3.1.1. Filename search
43
45
44
                3.2. Complex/advanced search
46
                3.2. Complex/advanced search
45
47
46
                3.3. Document history
48
                3.3. Document history
47
49
...
...
108
110
109
   You do not need to remember in what file or email message you stored a
111
   You do not need to remember in what file or email message you stored a
110
   given piece of information. You just ask for related terms, and the tool
112
   given piece of information. You just ask for related terms, and the tool
111
   will return a list of documents where those terms are prominent.
113
   will return a list of documents where those terms are prominent.
112
114
113
   This mode of operation has been made very familiar by www search engines.
115
   This mode of operation has been made very familiar by internet search
116
   engines.
114
117
115
   The notion of relevance is a difficult one, as only you, the user,
118
   The notion of relevance is a difficult one, as only you, the user,
116
   actually know which documents are relevant to your search, and the
119
   actually know which documents are relevant to your search, and the
117
   application can only try a guess. The quality of this guess is probably
120
   application can only try a guess. The quality of this guess is probably
118
   the most important element for a search application.
121
   the most important element for a search application.
...
...
153
   Stemming depends on the document language. Recoll stores the unstemmed
156
   Stemming depends on the document language. Recoll stores the unstemmed
154
   versions of terms and uses auxiliary databases for term expansion. It can
157
   versions of terms and uses auxiliary databases for term expansion. It can
155
   switch stemming languages, or add a language, without reindexing. Storing
158
   switch stemming languages, or add a language, without reindexing. Storing
156
   documents in different languages in the same database is possible, and
159
   documents in different languages in the same database is possible, and
157
   useful in practice, but does introduce possibilities of confusion. Recoll
160
   useful in practice, but does introduce possibilities of confusion. Recoll
158
   makes no attempt at automatic language recognition.
161
   currently makes no attempt at automatic language recognition.
159
162
160
   Recoll has many parameters which define exactly what to index, and how to
163
   Recoll has many parameters which define exactly what to index, and how to
161
   classify and decode the source documents. These are kept in a
164
   classify and decode the source documents. These are kept in a
162
   configuration file. A sample configuration is installed into the .recoll
165
   configuration file. A default configuration is copied into a standard
163
   subdirectory of your home directory when you first execute a Recoll
166
   location (usually something like /usr/[local/]share/recoll/examples)
164
   command. The initial configuration will index your home directory with
167
   during installation. The default parameters from this file may be
165
   default parameters and should be sufficient for giving Recoll a try, but
168
   overriden by values that you set inside your personal configuration, found
166
   you may want to adjust it later.
169
   by default in the .recoll subdirectory of your home directory. The default
170
   configuration will index your home directory with default parameters and
171
   should be sufficient for giving Recoll a try, but you may want to adjust
172
   it later.
167
173
168
   Indexation is started automatically the first time you execute the recoll
174
   Indexation is started automatically the first time you execute the recoll
169
   search graphical user interface, or by executing the recollindex command.
175
   search graphical user interface, or by executing the recollindex command.
170
176
171
   Searches are performed inside the recoll program, which has many options
177
   Searches are performed inside the recoll program, which has many options
...
...
214
220
215
     ----------------------------------------------------------------------
221
     ----------------------------------------------------------------------
216
222
217
2.2. The indexation configuration
223
2.2. The indexation configuration
218
224
219
   The main configuration file is named $HOME/.recoll/recoll.conf by default
225
   Values set in the system-wide configuration file (named like
226
   /usr/[local/]share/recoll/examples/recoll.conf) can be overriden by those
227
   set in the personal one, named $HOME/.recoll/recoll.conf by default or
220
   or $RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set.
228
   $RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set.
221
229
222
   The most accurate documentation for editing the file is given by comments
230
   The most accurate documentation for editing the file is given by comments
223
   inside the default file that will be created when you first start recoll.
231
   inside the central one. If you want to adjust the configuration before
224
   If you want to adjust the configuration before indexation, just click
232
   indexation, just click Cancel when the program asks if it should start
225
   Cancel when the program asks if it should start initial indexation.
233
   initial indexation. This will have created a .recoll directory containing
234
   empty configuration files.
226
235
227
   The configuration is also documented inside the installation chapter of
236
   The configuration is also documented inside the installation chapter of
228
   this document, or in the recoll.conf(5) man page.
237
   this document, or in the recoll.conf(5) man page.
229
238
230
     ----------------------------------------------------------------------
239
     ----------------------------------------------------------------------
...
...
281
   checkbox to ensure that only documents with all the terms will be
290
   checkbox to ensure that only documents with all the terms will be
282
   returned. Use the Tools / Advanced search dialog for more complex
291
   returned. Use the Tools / Advanced search dialog for more complex
283
   searches.
292
   searches.
284
293
285
   After starting a search, a list of results will instantly be displayed in
294
   After starting a search, a list of results will instantly be displayed in
286
   the main list window. Clicking on an entry will open an internal preview
295
   the main list window. Clicking on the Preview link for an entry will open
287
   window for the document. Double-clicking will attempt to start an external
296
   an internal preview window for the document. Clicking the Edit link will
288
   viewer (have a look at the ~/.recoll/mimeconf file to see how these are
297
   attempt to start an external viewer (have a look at the mimeconf
289
   configured).
298
   configuration file to see how these are configured).
290
299
291
   By default, the document list is presented in order of relevance (how well
300
   By default, the document list is presented in order of relevance (how well
292
   the system estimates that the document matches the query). You can specify
301
   the system estimates that the document matches the query). You can specify
293
   a different ordering by using the Tools / Sort parameters dialog.
302
   a different ordering by using the Tools / Sort parameters dialog.
294
303
295
   You can click on the first paragraph (Query results or No results found)
304
   You can click on the Query details link at the top of the results page to
296
   in the result list to get an exact display of the query actually
305
   see the query actually performed, after stem expansion and other
297
   performed, after stem expansion and other processing.
306
   processing.
307
308
     ----------------------------------------------------------------------
309
310
  3.1.1. Filename search
311
312
   If the File name checkbox at the left of the search terms is checked, the
313
   search will only done for file names. In this case you can use the usual
314
   shell wildcard characters * and ? for expanding the search (ie
315
   *somestring*).
298
316
299
     ----------------------------------------------------------------------
317
     ----------------------------------------------------------------------
300
318
301
3.2. Complex/advanced search
319
3.2. Complex/advanced search
302
320
303
   The advanced search dialog has fields that will allow a more refined
321
   The advanced search dialog has fields that will allow a more refined
304
   search, looking for documents with all given words, a given exact phrase,
322
   search, looking for documents with all given words, a given exact phrase,
305
   or none of the given words (all relevant fields will be combined by an
323
   none of the given words, or a given file name (with wildcard expansion).
306
   implicit AND clause).
324
   All relevant fields will be combined by an implicit AND clause.
307
325
308
   It will let you search for documents of specific mime types (ie: only
326
   It will let you search for documents of specific mime types (ie: only
309
   text/plain, or text/html or application/pdf etc...)
327
   text/plain, or text/html or application/pdf etc...)
310
328
311
   It will let you restrict the search results to a subtree of the indexed
329
   It will let you restrict the search results to a subtree of the indexed
312
   area.
330
   area.
313
331
314
   Click on the Start Search button in the advanced search dialog to start
332
   Click on the Start Search button in the advanced search dialog to start
315
   the search. The button in the main window always performs a simple search.
333
   the search. The button in the main window always performs a simple search.
316
334
317
   Click on the result list header paragraph to see the query expansion.
335
   Click on the Show query details link at the top of the result page to see
336
   the query expansion.
318
337
319
     ----------------------------------------------------------------------
338
     ----------------------------------------------------------------------
320
339
321
3.3. Document history
340
3.3. Document history
322
341
...
...
335
   The tool sorts a specified number of the most relevant documents in the
354
   The tool sorts a specified number of the most relevant documents in the
336
   result list, according to specified criteria. The currently available
355
   result list, according to specified criteria. The currently available
337
   criteria are date and mime type.
356
   criteria are date and mime type.
338
357
339
   The sort parameters stay in effect until they are explicitely reset, or
358
   The sort parameters stay in effect until they are explicitely reset, or
340
   the program exits.
359
   the program exits. An activated sort is indicated in the result list
360
   header.
341
361
342
     ----------------------------------------------------------------------
362
     ----------------------------------------------------------------------
343
363
344
3.5. Search tips, shortcuts
364
3.5. Search tips, shortcuts
345
365
...
...
356
   Query explanation. You can get an exact description of what the query
376
   Query explanation. You can get an exact description of what the query
357
   looked for, including stem expansion, and boolean operators used, by
377
   looked for, including stem expansion, and boolean operators used, by
358
   clicking on the result list header.
378
   clicking on the result list header.
359
379
360
   File names. All file name elements (the broken up file path) are entered
380
   File names. All file name elements (the broken up file path) are entered
361
   as terms during indexation, and you can specify them when searching.
381
   as terms during indexation, and you can specify them as ordinary terms in
382
   normal search fields. Alternatively, you can use specific file name search
383
   which will only look for file names and can use wildcard expansion.
362
384
363
   Quitting. Entering ^Q almost anywhere will close the application.
385
   Quitting. Entering ^Q almost anywhere will close the application.
364
386
365
   Closing previews. Entering ^W in a preview tab will close it (and, for the
387
   Closing previews. Entering ^W in a preview tab will close it (and, for the
366
   last tab, close the preview window).
388
   last tab, close the preview window).
...
...
436
458
437
   External file types. Recoll uses external applications to index some file
459
   External file types. Recoll uses external applications to index some file
438
   types. You need to install them for the file types that you wish to have
460
   types. You need to install them for the file types that you wish to have
439
   indexed:
461
   indexed:
440
462
463
     * PDF: pdftotext is part of the Xpdf package.
464
465
     * Postscript: pstotext.
466
441
     * MS Word: antiword.
467
     * MS Word: antiword.
442
468
443
     * PDF: pdftotext is part of the Xpdf package.
444
445
     * Postscript: pstotext.
446
447
     * RTF: unrtf
469
     * RTF: unrtf
470
471
     * dvi: dvips
472
473
     * djvu: DjVuLibre
474
475
   Text, Html, mail folders and Openoffice files are processed internally.
448
476
449
     ----------------------------------------------------------------------
477
     ----------------------------------------------------------------------
450
478
451
  4.1.2. Building
479
  4.1.2. Building
452
480
...
...
523
551
524
     ----------------------------------------------------------------------
552
     ----------------------------------------------------------------------
525
553
526
4.3. Configuration overview
554
4.3. Configuration overview
527
555
528
   The personal configuration files and the database are normally kept in the
556
   There are two sets of configuration files. The system-wide files are kept
557
   in a directory named like /usr/[local/]share/recoll/examples, they define
558
   default values for the system. A parallel set of files exists in the
529
   .recoll directory in your home (this can be changed with the
559
   .recoll directory in your home (this can be changed with the
530
   RECOLL_CONFDIR environment variable, and a parameter inside the main
560
   RECOLL_CONFDIR environment variable. The database is also kept in .recoll
531
   configuration file). If this directory does not exist when recoll or
561
   by default, (this can be changed by a configuration parameter).
532
   recollindex are started, the directory will be created and the sample
562
533
   configuration files will be copied. recoll will give you a chance to edit
563
   If the .recoll directory does not exist when recoll or recollindex are
534
   the configuration file before starting indexation. recollindex will
564
   started, it will be created with a set of empty configuration files.
535
   proceed immediately.
565
   recoll will give you a chance to edit the configuration file before
566
   starting indexation. recollindex will proceed immediately.
536
567
537
   Most of the parameters specific to the recoll GUI are set through the
568
   Most of the parameters specific to the recoll GUI are set through the
538
   Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
569
   Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
539
   You probably do not want to edit this by hand.
570
   You probably do not want to edit this by hand.
540
571
541
   For other options, Recoll uses text configuration files. You will have to
572
   For other options, Recoll uses text configuration files. You will have to
542
   edit them by hand for now (there is still some hope for a GUI
573
   edit them by hand for now (there is still some hope for a GUI
543
   configuration tool in the future). The most accurate documentation for the
574
   configuration tool in the future). The most accurate documentation for the
544
   configuration parameters is given by comments inside the sample files, and
575
   configuration parameters is given by comments inside the default files,
545
   we will just give a general overview here.
576
   and we will just give a general overview here.
546
577
547
   All configuration files share the same format. For exemple, a short
578
   All configuration files share the same format. For exemple, a short
548
   extract of the main configuration file might look as follows:
579
   extract of the main configuration file might look as follows:
549
580
550
         # Space-separated list of directories to index.
581
         # Space-separated list of directories to index.
...
...
575
606
576
     ----------------------------------------------------------------------
607
     ----------------------------------------------------------------------
577
608
578
  4.3.1. Main configuration file
609
  4.3.1. Main configuration file
579
610
580
   ~/.recoll/recoll.conf is the main configuration file. It defines things
611
   recoll.conf is the main configuration file. It defines things like what to
581
   like what to index (top directories and things to ignore), and the default
612
   index (top directories and things to ignore), and the default character
582
   character set to use for document types which do not specify it
613
   set to use for document types which do not specify it internally.
583
   internally.
584
614
585
   The default configuration will index your home directory. If this is not
615
   The default configuration will index your home directory. If this is not
586
   appropriate, use recoll to copy the sample configuration, click Cancel,
616
   appropriate, use recoll to copy the sample configuration, click Cancel,
587
   and edit the configuration file before restarting the command. This will
617
   and edit the configuration file before restarting the command. This will
588
   start the initial indexation, which may take some time.
618
   start the initial indexation, which may take some time.
...
...
668
           determining the mime type for a file (the main procedure uses
698
           determining the mime type for a file (the main procedure uses
669
           suffix associations as defined in the mimemap file). This can be
699
           suffix associations as defined in the mimemap file). This can be
670
           useful for files with suffixless names, but it will also cause the
700
           useful for files with suffixless names, but it will also cause the
671
           indexation of many bogus "text" files.
701
           indexation of many bogus "text" files.
672
702
703
   indexallfilenames
704
705
           Recoll indexes file names in a special section of the database to
706
           allow specific file names searches using wild cards. This
707
           parameter decides if file name indexing is performed only for
708
           files with mime types that would qualify them for full text
709
           indexation, or for all files inside the selected subtrees,
710
           independant of mime type.
711
673
     ----------------------------------------------------------------------
712
     ----------------------------------------------------------------------
674
713
675
  4.3.2. The mimemap file
714
  4.3.2. The mimemap file
676
715
677
   ~/.recoll/mimemap specifies the file name extension to mime type mappings.
716
   mimemap specifies the file name extension to mime type mappings.
678
717
679
   For file names without an extension, or with an unknown one, the system's
718
   For file names without an extension, or with an unknown one, the system's
680
   file -i command will be executed to determine the mime type (this can be
719
   file -i command will be executed to determine the mime type (this can be
681
   switched off inside the main configuration file).
720
   switched off inside the main configuration file).
682
721
...
...
697
736
698
     ----------------------------------------------------------------------
737
     ----------------------------------------------------------------------
699
738
700
  4.3.3. The mimeconf file
739
  4.3.3. The mimeconf file
701
740
702
   ~/.recoll/mimeconf specifies how the different mime types are handled for
741
   mimeconf specifies how the different mime types are handled for
703
   indexation, and for display.
742
   indexation, and for display.
704
743
705
   Changing the indexation parameters is probably not a good idea except if
744
   Changing the indexation parameters is probably not a good idea except if
706
   you are a Recoll developper.
745
   you are a Recoll developper.
707
746
708
   You may want to adjust the external viewers defined in (ie: html is either
747
   You may want to adjust the external viewers defined in (ie: html is either
709
   previewed internally or displayed using firefox, but you may prefer
748
   previewed internally or displayed using firefox, but you may prefer
749
   mozilla, your openoffice.org program might be named oofice instead of
710
   mozilla...). Look for the [view] section.
750
   openoffice ...). Look for the [view] section.
711
751
712
   You can also change the icons which are displayed by recoll in the result
752
   You can also change the icons which are displayed by recoll in the result
713
   lists (the values are the basenames of the png images inside the iconsdir
753
   lists (the values are the basenames of the png images inside the iconsdir
714
   directory (specified in recoll.conf).
754
   directory (specified in recoll.conf).
715
755