Switch to unified view

a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml
...
...
22
      <year>2005</year>
22
      <year>2005</year>
23
      <holder role="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois
23
      <holder role="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois
24
      Dockes</holder>
24
      Dockes</holder>
25
    </copyright>
25
    </copyright>
26
26
27
    <releaseinfo>$Id: usermanual.sgml,v 1.4 2006-01-19 12:01:42 dockes Exp $</releaseinfo>
27
    <releaseinfo>$Id: usermanual.sgml,v 1.5 2006-02-01 07:05:06 dockes Exp $</releaseinfo>
28
28
29
    <abstract>
29
    <abstract>
30
      <para>This document introduces full text search notions
30
      <para>This document introduces full text search notions
31
      and describes the installation and use of the &RCL; application.</para>
31
      and describes the installation and use of the &RCL; application.</para>
32
    </abstract>
32
    </abstract>
...
...
114
      <para>In practice, &XAP; works by remembering where terms appear
114
      <para>In practice, &XAP; works by remembering where terms appear
115
      in your document files. The acquisition process is called
115
      in your document files. The acquisition process is called
116
      indexation. </para> 
116
      indexation. </para> 
117
117
118
      <para>The resulting database can be big (roughly the size of the
118
      <para>The resulting database can be big (roughly the size of the
119
        original document set), but it  is not a document archive. &RCL;
119
        original document set), but it is not a document
120
        can only display documents that still exist at the place from which
120
        archive. &RCL; can only display documents that still exist at
121
        they were indexed.</para>
121
        the place from which they were indexed. (Actually, there is a
122
        way to reconstruct a document from the information in the
123
        database, but the result is not nice, as all formatting,
124
        punctuation and capitalisation are lost).</para>
122
125
123
      <para>&RCL; stores all internal data in <application>Unicode
126
      <para>&RCL; stores all internal data in <application>Unicode
124
      UTF-8</application> format, and it can index files with
127
      UTF-8</application> format, and it can index files with
125
      different character sets, encodings, and languages into the same
128
      different character sets, encodings, and languages into the same
126
      database. It has input filters for many document types.</para>
129
      database. It has input filters for many document types.</para>
...
...
174
177
175
      <para>&RCL; indexation takes place at discrete times. There is
178
      <para>&RCL; indexation takes place at discrete times. There is
176
      currently no interface to real time file modification
179
      currently no interface to real time file modification
177
      monitors. The typical usage is to have a nightly indexation run
180
      monitors. The typical usage is to have a nightly indexation run
178
      <link linkend="rcl.indexing.automat">programmed</link> into your
181
      <link linkend="rcl.indexing.automat">programmed</link> into your
179
      <command>cron</command> file.</para> 
182
      <command>cron</command> file.</para>
183
184
      <sidebar><para>Side note: there is nothing in &RCL; and &XAP;
185
      that would prevent interfacing with a real time file
186
      modification monitor, but this would tend to consume significant
187
      system resources for dubious gain, because you rarely need a
188
      full text search to find documents you just
189
      modified. <command>recollindex -i</command>  can be used to add
190
      individual files to the index if you want to play with this, see
191
      the manual page.</para>
192
      </sidebar>
193
180
194
181
      <para>&RCL; knows about quite a few different document
195
      <para>&RCL; knows about quite a few different document
182
      types. The parameters for document types recognition and
196
      types. The parameters for document types recognition and
183
      processing are set in 
197
      processing are set in 
184
       <link linkend="rcl.indexing.config">configuration files</link>
198
       <link linkend="rcl.indexing.config">configuration files</link>
...
...
276
    <application>QT</application> library.</para>
290
    <application>QT</application> library.</para>
277
291
278
    <sect1 id="rcl.search.simple">
292
    <sect1 id="rcl.search.simple">
279
      <title>Simple search</title>
293
      <title>Simple search</title>
280
294
295
      <procedure>
281
      <para>Start the <command>recoll</command> program, then
296
  <step><para>Start the <command>recoll</command> program.</para>
297
  </step>
282
        enter search term(s) in the text field at the top left of the
298
  <step><para>Enter search term(s) in the text field at the top of the
299
        window.</para>
300
  </step>
283
        window. Clicking the <guilabel>Search</guilabel> button or
301
  <step><para>Click the <guilabel>Search</guilabel> button or
284
        hitting the <keycap>Enter</keycap> key will start a search. By
302
        hit the <keycap>Enter</keycap> key to start the search.</para>
303
  </step>
304
      </procedure>
305
285
        default, this will look for documents with any of the terms
306
      <para>By default, this will look for documents with any of the
286
        (the ones with more terms will get better scores). You can
307
      search terms (the ones with more terms will get better scores). You can
287
        check the <guilabel>All terms</guilabel> checkbox to ensure
308
        check the <guilabel>All terms</guilabel> checkbox to ensure
288
        that only documents with all the terms will be returned. Use
309
        that only documents with all the terms will be returned. Use
289
        the <guilabel>Tools</guilabel> / <guilabel>Advanced
310
        the <guilabel>Tools</guilabel> / <guilabel>Advanced
290
        search</guilabel> dialog for more complex searches.</para>
311
        search</guilabel> dialog for more complex searches.</para>
291
312
...
...
301
      relevance (how well the system estimates that the document
322
      relevance (how well the system estimates that the document
302
      matches the query). You can specify a different ordering by
323
      matches the query). You can specify a different ordering by
303
      using the  <link linkend="rcl.search.sort"><guilabel>Tools</guilabel>
324
      using the  <link linkend="rcl.search.sort"><guilabel>Tools</guilabel>
304
        / <guilabel>Sort parameters</guilabel></link> dialog.</para>
325
        / <guilabel>Sort parameters</guilabel></link> dialog.</para>
305
326
327
      <para>You can click on the first paragraph (<literal>Query
328
      results</literal> or <literal>No results found</literal>) in the
329
      result list to get an exact display of the query actually
330
      performed, after stem expansion and other processing.</para>
331
306
    </sect1>
332
    </sect1>
307
333
308
      <sect1 id="rcl.search.complex">
334
      <sect1 id="rcl.search.complex">
309
      <title>Complex/advanced search</title>
335
      <title>Complex/advanced search</title>
310
336
311
      <para>The advanced search dialog has fields that will allow a more
337
      <para>The advanced search dialog has fields that will allow a more
312
        refined search, looking for documents with all given words, a
338
        refined search, looking for documents with all given words, a
313
        given exact phrase, or none of the given words (all fields may
339
        given exact phrase, or none of the given words (all relevant fields
314
        be combined by an implicit AND clause).</para>
340
        will be combined by an implicit AND clause).</para>
315
341
316
      <para>It will let you search for documents of specific mime
342
      <para>It will let you search for documents of specific mime
317
        types (ie: only <literal>text/plain</literal>, or
343
        types (ie: only <literal>text/plain</literal>, or
318
        <literal>text/html</literal> or
344
        <literal>text/html</literal> or
319
        <literal>application/pdf</literal> etc...)</para>
345
        <literal>application/pdf</literal> etc...)</para>
...
...
322
        the indexed area.</para>
348
        the indexed area.</para>
323
349
324
      <para>Click on the <guilabel>Start Search</guilabel> button in
350
      <para>Click on the <guilabel>Start Search</guilabel> button in
325
      the advanced search dialog to start the search. The button in
351
      the advanced search dialog to start the search. The button in
326
      the main window always performs a simple search.</para>
352
      the main window always performs a simple search.</para>
353
354
      <para>Click on the result list header paragraph to see the query
355
      expansion.</para>
327
356
328
    </sect1>
357
    </sect1>
329
358
330
    <sect1 id="rcl.search.history">
359
    <sect1 id="rcl.search.history">
331
      <title>Document history</title>
360
      <title>Document history</title>
...
...
378
        <guilabel>This exact phrase</guilabel> field of the advanced
407
        <guilabel>This exact phrase</guilabel> field of the advanced
379
        search dialog to the same effect.</para>
408
        search dialog to the same effect.</para>
380
      </formalpara>
409
      </formalpara>
381
410
382
      <formalpara><title>Query explanation</title>
411
      <formalpara><title>Query explanation</title>
383
  <para>You can get an exact description of what the query
412
        <para>You can get an exact description of what the query
384
  looked for, including stem expansion, and boolean operators
413
        looked for, including stem expansion, and boolean operators
385
  used, by clicking on the result list header.</para>
414
        used, by clicking on the result list header.</para>
386
      </formalpara>
415
      </formalpara>
387
416
388
      <formalpara><title>Quitting</title>
417
      <formalpara><title>Quitting</title>
389
      <para>Entering <keycap>^Q</keycap> almost anywhere will
418
      <para>Entering <keycap>^Q</keycap> almost anywhere will
390
        close the application.</para>
419
        close the application.</para>
...
...
401
430
402
      <para>It is possible to customise some aspects of the search
431
      <para>It is possible to customise some aspects of the search
403
      interface by using <guimenu>Query configuration</guimenu> entry
432
      interface by using <guimenu>Query configuration</guimenu> entry
404
      in the <guimenu>Preferences</guimenu> menu.</para>
433
      in the <guimenu>Preferences</guimenu> menu.</para>
405
434
406
      <para>There are two tabs in the dialog, to modify the appearance
435
      <para>There are two tabs in the dialog, dealing with the
407
      of the user interface (result list appearance), or the
436
      interface itself, and with the parameters used for searching and
408
      parameters used for searching (language used for stem
437
      returning results.</para> 
409
      expansion).</para> 
410
438
411
      <para>The stemming language can be chosen among those that were
439
      <para>User interface parameters:</para>
412
      specified in the configuration file, or later added with
440
      <itemizedlist>
441
442
  <listitem><para><guilabel>Number of results in a result
443
  page</guilabel></para> 
444
  </listitem>
445
446
  <listitem><para><guilabel>Result list font</guilabel>: There
447
  is quite a lot of information shown in the result list, and
448
  you may want to customise the font and/or font size. The rest
449
  of the fonts used by &RCL; are determined by your generic QT
450
  config (try the <command>qtconfig</command> command.</para>
451
  </listitem>
452
453
  <listitem><para><guilabel>Html help browser</guilabel>: this
454
  will let you chose your the preferred browser which will be
455
  started from the <guimenu>Help</guimenu> menu to read the user
456
  manual. You can enter a simple name if the command is in your
457
  PATH, or browse for a full pathname.</para>
458
  </listitem>
459
  <listitem><para><guilabel>Show document type icons in result
460
  list</guilabel>: icons in the result list can be turned
461
  off. They take quite a lot of space and convey relatively
462
  little useful information.</para>
463
  </listitem>
464
      </itemizedlist>
465
466
      <para>Search parameters:</para>
467
468
      <itemizedlist>
469
  <listitem><para><guilabel>Stemming language</guilabel>:
470
  stemming obviously depends on the document's language. This
471
  listbox will let you chose among the stemming databases which
472
  were built during indexing (this is set in the <link
473
  linkend="rcl.install.config.recollconf">main configuration
474
  file</link>), or later added with
413
      <command>recollindex -s</command> (See the recollindex
475
      <command>recollindex -s</command> (See the recollindex
414
      manual). Stemming languages which are dynamically added will be
476
      manual). Stemming languages which are dynamically added will be
415
      deleted at the next indexation pass unless they are also added in
477
      deleted at the next indexation pass unless they are also added in
416
      the configuration file.</para>
478
      the configuration file.</para>
479
  </listitem>
480
481
  <listitem><para><guilabel>Dynamically build
482
  abstracts</guilabel>: this decides if &RCL; tries to build
483
  document abstracts when displaying the result list. Abstracts
484
  are constructed by taking context from the document
485
  information, around the search terms. This can slow down
486
  result list display significantly for big documents, and you
487
  may want to turn it off.</para>
488
  </listitem>
489
  <listitem><para><guilabel>Replace abstracts from
490
  documents</guilabel>: this decides if we should synthetize and
491
  display an abstract in place of an explicit abstract found
492
  within the document itself.</para>
493
  </listitem>
494
      </itemizedlist>
417
495
418
    </sect1>
496
    </sect1>
419
497
420
  </chapter>
498
  </chapter>
421
499
...
...
425
503
426
      <sect1 id="rcl.install.building">
504
      <sect1 id="rcl.install.building">
427
      <title>Building from source</title>
505
      <title>Building from source</title>
428
506
429
      <sect2 id="rcl.install.building.prereqs">
507
      <sect2 id="rcl.install.building.prereqs">
430
  <title>Prerequisites</title>
508
        <title>Prerequisites</title>
431
509
432
      <para>At the very least, you will need to download and install the
510
      <para>At the very least, you will need to download and install the
433
  <ulink url="http://www.xapian.org">xapian core
511
        <ulink url="http://www.xapian.org">xapian core package</ulink>
434
  package</ulink> (&RCL; currently uses version 0.9.2), and the <ulink
512
        (&RCL; currently uses version 0.9.2), and the <ulink
435
  url="http://www.trolltech.com/products/qt/index.html">qt
513
        url="http://www.trolltech.com/products/qt/index.html">qt
436
    runtime and development packages</ulink> (&RCL; currently uses
514
          runtime and development packages</ulink> (&RCL; development
437
  version 3.3.3).</para>
515
          currently uses version 3.3.5, but any 3.3 version is
516
          probably ok).</para> 
438
517
439
      <para>You will most probably be able to find a binary package for
518
      <para>You will most probably be able to find a binary package for
440
  <application>qt</application> for your system. You may have to
519
        <application>qt</application> for your system. You may have to
441
  compile <application>Xapian</application>, 
442
  but this is not difficult (if you are using
520
        compile &XAP; but this is not difficult (if you are using
443
  <application>FreeBSD</application>, there is a port).</para>
521
        <application>FreeBSD</application>, there is a port).</para>
444
522
445
      <para>You may also need 
523
      <para>You may also need 
446
  <ulink
524
        <ulink
447
  url="http://www.gnu.org/software/libiconv/">libiconv</ulink>. &RCL;
525
        url="http://www.gnu.org/software/libiconv/">libiconv</ulink>. &RCL;
448
  currently uses version 1.9 (this should not be critical). On
526
        currently uses version 1.9 (this should not be critical). On
449
  <application>Linux</application> systems, the iconv interface
527
        <application>Linux</application> systems, the iconv interface
450
  is part of libc and you should not need to do anything
528
        is part of libc and you should not need to do anything
451
  special.</para>
529
        special.</para>
452
      
530
      
453
      <formalpara><title>External file types</title><para>&RCL; uses
531
      <formalpara><title>External file types</title><para>&RCL; uses
454
      external applications 
532
      external applications 
455
  to index some file types. You need to install them for the
533
        to index some file types. You need to install them for the
456
  file types that you wish to have indexed:</para>
534
        file types that you wish to have indexed:</para>
457
  </formalpara>
535
        </formalpara>
458
536
459
      <itemizedlist>
537
      <itemizedlist>
460
538
461
  <listitem><para>MS Word: <ulink
539
        <listitem><para>MS Word: <ulink
462
  url="http://www.winfield.demon.nl"> 
540
        url="http://www.winfield.demon.nl"> 
463
      antiword</ulink>.</para>
541
            antiword</ulink>.</para>
464
    </listitem>
542
          </listitem>
465
543
466
  <listitem><para>PDF: pdftotext is part of the <ulink
544
        <listitem><para>PDF: pdftotext is part of the <ulink
467
      url="http://www.foolabs.com/xpdf/">Xpdf</ulink> package.</para>
545
            url="http://www.foolabs.com/xpdf/">Xpdf</ulink> package.</para>
468
    </listitem>
546
          </listitem>
469
547
470
  <listitem><para>Postscript: <ulink
548
        <listitem><para>Postscript: <ulink
471
    url="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">
549
          url="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">
472
      pstotext</ulink>.</para>
550
            pstotext</ulink>.</para>
551
          </listitem>
552
473
    </listitem>
553
        <listitem>
474
475
  <listitem>
476
      <para>RTF: <ulink
554
            <para>RTF: <ulink
477
      url="http://www.gnu.org/software/unrtf/unrtf.html">unrtf</ulink>
555
            url="http://www.gnu.org/software/unrtf/unrtf.html">unrtf</ulink>
478
          </para>
556
          </para>
479
    </listitem>
557
          </listitem>
480
    
558
          
481
  </itemizedlist>
559
        </itemizedlist>
482
560
483
      <sect2 id="rcl.install.building.build">
561
      <sect2 id="rcl.install.building.build">
484
  <title>Building</title>
562
        <title>Building</title>
485
563
486
      <para>&RCL; has been built on
564
      <para>&RCL; has been built on
487
  Linux (redhat7.3, mandriva 2005, Fedora Core 3), FreeBSD and
565
        Linux (redhat7.3, mandriva 2005, Fedora Core 3), FreeBSD and
488
  Solaris 8. If you build on another system, <ulink
566
        Solaris 8. If you build on another system, <ulink
489
  url="mailto:jean-francois.dockes@wanadoo.fr">I would very much
567
        url="mailto:jean-francois.dockes@wanadoo.fr">I would very much
490
  welcome patches</ulink>.</para>
568
        welcome patches</ulink>.</para>
491
569
492
      <para>Depending on the <application>qt</application>
570
      <para>Depending on the <application>qt</application>
493
      configuration on your system, you may have to set the
571
      configuration on your system, you may have to set the
494
      <literal>QTDIR</literal> and <literal>QMAKESPECS</literal>
572
      <literal>QTDIR</literal> and <literal>QMAKESPECS</literal>
495
      variables in your environment:</para>
573
      variables in your environment:</para>
496
  <itemizedlist>
574
        <itemizedlist>
497
    <listitem><para><literal>QTDIR</literal> should point to the
575
          <listitem><para><literal>QTDIR</literal> should point to the
498
    directory above the one that holds the qt include files (ie:
576
          directory above the one that holds the qt include files (ie:
499
    qt.h).</para>
577
          qt.h).</para>
500
    </listitem>
578
          </listitem>
501
    <listitem><para><literal>QMAKESPECS</literal> should
579
          <listitem><para><literal>QMAKESPECS</literal> should
502
    be set to the name of one of the
580
          be set to the name of one of the
503
    <application>qt</application> mkspecs subdirectories (ie:
581
          <application>qt</application> mkspecs subdirectories (ie:
504
    linux-g++).</para> 
582
          linux-g++).</para> 
505
    </listitem>
583
          </listitem>
506
  </itemizedlist>
584
        </itemizedlist>
507
585
508
  <para>On many Linux systems, <literal>QTDIR</literal> is set
586
        <para>On many Linux systems, <literal>QTDIR</literal> is set
509
  by the login scripts, and <literal>QMAKESPECS</literal> is not
587
        by the login scripts, and <literal>QMAKESPECS</literal> is not
510
  needed because there is a <filename>default</filename> link in
588
        needed because there is a <filename>default</filename> link in
511
  <filename>mkspecs/</filename>.</para>
589
        <filename>mkspecs/</filename>.</para>
512
590
513
  <para>The &RCL; <command>configure</command> script does a
591
        <para>The &RCL; <command>configure</command> script does a
514
  better job of checking these variables after release
592
        better job of checking these variables after release
515
  1.1.1. Before this, unexplained errors will occur during
593
        1.1.1. Before this, unexplained errors will occur during
516
  compilation if the environment is not set up. Also, for 1.1.0 the
594
        compilation if the environment is not set up. Also, for 1.1.0 the
517
  <command>qmake</command> command should be in your PATH (later
595
        <command>qmake</command> command should be in your PATH (later
518
  releases can also find it in
596
        releases can also find it in
519
  <filename>$QTDIR/bin</filename>).</para> 
597
        <filename>$QTDIR/bin</filename>).</para> 
520
598
521
      <para>Normal procedure:</para>
599
      <para>Normal procedure:</para>
522
      <screen>
600
      <screen>
523
        <userinput>cd recoll-xxx</userinput>
601
        <userinput>cd recoll-xxx</userinput>
524
        <userinput>configure</userinput>
602
        <userinput>configure</userinput>
...
...
526
        <userinput>(practises usual hardship-repelling invocations)</userinput>
604
        <userinput>(practises usual hardship-repelling invocations)</userinput>
527
      </screen>
605
      </screen>
528
606
529
607
530
      <para>There little autoconfiguration. The
608
      <para>There little autoconfiguration. The
531
  <command>configure</command> script will mainly link one of
609
        <command>configure</command> script will mainly link one of
532
  the system-specific files in the <filename>mk</filename>
610
        the system-specific files in the <filename>mk</filename>
533
  directory to <filename>mk/sysconf</filename>. If your system
611
        directory to <filename>mk/sysconf</filename>. If your system
534
  is not known yet, it will tell you as much, and you may want
612
        is not known yet, it will tell you as much, and you may want
535
  to manually copy and modify one of the existing files (the new
613
        to manually copy and modify one of the existing files (the new
536
  file name should be the output of <command>uname -s</command>).</para>
614
        file name should be the output of <command>uname -s</command>).</para>
537
      </sect2>
615
      </sect2>
538
616
539
      <sect2 id="rcl.install.building.install">
617
      <sect2 id="rcl.install.building.install">
540
  <title>Installation</title>
618
        <title>Installation</title>
541
      
619
      
542
      <para>Either type <userinput>make install</userinput> or execute
620
      <para>Either type <userinput>make install</userinput> or execute
543
      <userinput>recollinstall targetdir</userinput>, in the root
621
      <userinput>recollinstall targetdir</userinput>, in the root
544
  of the source tree. This will copy the commands to
622
        of the source tree. This will copy the commands to
545
  <filename>$targetdir/bin</filename> and the sample
623
        <filename>$targetdir/bin</filename> and the sample
546
  configuration files, scripts and other shared data to 
624
        configuration files, scripts and other shared data to 
547
  <filename>$targetdir/share/recoll</filename>.</para>
625
        <filename>$targetdir/share/recoll</filename>.</para>
548
      </sect2>
626
      </sect2>
549
    </sect1>
627
    </sect1>
550
628
551
    <sect1 id="rcl.install.binary">
629
    <sect1 id="rcl.install.binary">
552
      <title>Installing a prebuilt copy</title>
630
      <title>Installing a prebuilt copy</title>
553
631
554
      <sect2 id="rcl.install.binary.package">
632
      <sect2 id="rcl.install.binary.package">
555
  <title>Installing through a package system</title>
633
        <title>Installing through a package system</title>
556
634
557
  <para>If you are lucky enough to be using a port system or a
635
        <para>If you are lucky enough to be using a port system or a
558
  prebuilt package (RPM or other), just follow the usual
636
        prebuilt package (RPM or other), just follow the usual
559
  procedure, and have a look at the <link
637
        procedure, and have a look at the <link
560
  linkend="rcl.install.config">configuration
638
        linkend="rcl.install.config">configuration
561
  section</link>.</para>
639
        section</link>.</para>
562
      </sect2>
640
      </sect2>
563
641
564
      <sect2 id="rcl.install.binary.rcl">
642
      <sect2 id="rcl.install.binary.rcl">
565
  <title>Installing a prebuilt &RCL;</title>
643
        <title>Installing a prebuilt &RCL;</title>
566
644
567
      <para>The unpackaged binary versions are just compressed tar
645
      <para>The unpackaged binary versions are just compressed tar
568
      files of a build
646
      files of a build
569
  tree, where only the useful parts were kept (executables and
647
        tree, where only the useful parts were kept (executables and
570
  sample configuration).</para>
648
        sample configuration).</para>
571
649
572
      <para>The executable binary files are built with a static link to
650
      <para>The executable binary files are built with a static link to
573
  libxapian and libiconv, to make installation easier (no
651
        libxapian and libiconv, to make installation easier (no
574
  dependencies). However, this also means that you cannot change
652
        dependencies). However, this also means that you cannot change
575
  the versions which are used.</para> 
653
        the versions which are used.</para> 
576
654
577
      <para>After extracting the tar file, you can proceed with
655
      <para>After extracting the tar file, you can proceed with
578
  <link
656
        <link
579
  linkend="rcl.install.building.install">installation</link> as
657
        linkend="rcl.install.building.install">installation</link> as
580
  if you had built the package from source.</para> 
658
        if you had built the package from source.</para> 
581
      </sect2>
659
      </sect2>
582
    </sect1>
660
    </sect1>
583
661
584
    <sect1 id="rcl.install.config">
662
    <sect1 id="rcl.install.config">
585
      <title>Configuration overview</title>
663
      <title>Configuration overview</title>
586
664
587
      <para>The personal configuration files and the database are
665
      <para>The personal configuration files and the database are
588
        normally kept in
666
        normally kept in
589
  the <filename>.recoll</filename> directory in your home (this
667
        the <filename>.recoll</filename> directory in your home (this
590
  can be changed with the <literal>RECOLL_CONFDIR</literal>
668
        can be changed with the <literal>RECOLL_CONFDIR</literal>
591
  environment variable, and a parameter inside the main
669
        environment variable, and a parameter inside the main
592
  configuration file). If this directory does not exist when
670
        configuration file). If this directory does not exist when
593
    <command>recoll</command> or 
671
          <command>recoll</command> or 
594
  <command>recollindex</command> are started, the
672
        <command>recollindex</command> are started, the
595
  directory will be created and the sample configuration files will
673
        directory will be created and the sample configuration files will
596
  be copied. <command>recoll</command> will give you a
674
        be copied. <command>recoll</command> will give you a
597
  chance to edit the configuration file before starting
675
        chance to edit the configuration file before starting
598
  indexation. <command>recollindex</command> will
676
        indexation. <command>recollindex</command> will
599
  proceed immediately.</para>
677
        proceed immediately.</para>
600
      
678
      
601
      <para>Most of the parameters specific to the
679
      <para>Most of the parameters specific to the
602
   <command>recoll</command> GUI are set through the
680
         <command>recoll</command> GUI are set through the
603
    <guilabel>Preferences</guilabel> menu and stored in the
681
          <guilabel>Preferences</guilabel> menu and stored in the
604
    standard QT place
682
          standard QT place
605
    (<filename>$HOME/.qt/recollrc</filename>). You probably do not
683
          (<filename>$HOME/.qt/recollrc</filename>). You probably do not
606
    want to edit this by hand.</para>
684
          want to edit this by hand.</para>
607
685
608
      <para>For other options, &RCL; uses text configuration
686
      <para>For other options, &RCL; uses text configuration
609
        files. You will have to edit them by hand for 
687
        files. You will have to edit them by hand for 
610
  now (there is still some hope for a GUI configuration tool
688
        now (there is still some hope for a GUI configuration tool
611
  in the future). The most accurate documentation for the
689
        in the future). The most accurate documentation for the
612
  configuration parameters is given by comments inside the sample
690
        configuration parameters is given by comments inside the sample
613
  files, and we will just give a general overview here.</para>
691
        files, and we will just give a general overview here.</para>
614
692
615
  <para>All configuration files share the same format. For
693
        <para>All configuration files share the same format. For
616
  exemple, a short extract of the main configuration file might
694
        exemple, a short extract of the main configuration file might
617
  look as follows:</para> 
695
        look as follows:</para> 
618
  <programlisting>
696
        <programlisting>
619
        # Space-separated list of directories to index.
697
        # Space-separated list of directories to index.
620
        topdirs =  ~/docs /usr/share/doc
698
        topdirs =  ~/docs /usr/share/doc
621
699
622
        [~/somedirectory-with-utf8-txt-files]
700
        [~/somedirectory-with-utf8-txt-files]
623
        defaultcharset = utf-8
701
        defaultcharset = utf-8
624
        </programlisting>
702
        </programlisting>
625
703
626
  <para>There are three kinds of lines: </para>
704
        <para>There are three kinds of lines: </para>
627
  <itemizedlist>
705
        <itemizedlist>
628
    <listitem><para>Comment (starts with
706
          <listitem><para>Comment (starts with
629
    <emphasis>#</emphasis>) or empty.</para> 
707
          <emphasis>#</emphasis>) or empty.</para> 
630
    </listitem>
708
          </listitem>
631
    <listitem><para>Parameter affectation (<emphasis>name =
709
          <listitem><para>Parameter affectation (<emphasis>name =
632
    value</emphasis>).</para> 
710
          value</emphasis>).</para> 
633
    </listitem>
711
          </listitem>
634
    <listitem><para>Section definition
712
          <listitem><para>Section definition
635
    ([<emphasis>somedirname</emphasis>]).</para> 
713
          ([<emphasis>somedirname</emphasis>]).</para> 
636
    </listitem>
714
          </listitem>
637
  </itemizedlist>
715
        </itemizedlist>
638
716
639
  <para>Section lines allow redefining some parameters for a
717
        <para>Section lines allow redefining some parameters for a
640
  directory subtree. Some of the parameters used for indexation
718
        directory subtree. Some of the parameters used for indexation
641
  are looked up hierarchically from the more to the less
719
        are looked up hierarchically from the more to the less
642
  specific. Not all parameters can be meaningfully redefined,
720
        specific. Not all parameters can be meaningfully redefined,
643
  this is specified for each in the next section. </para>
721
        this is specified for each in the next section. </para>
644
722
645
  <para>The tilde character (~) is expanded in file names to the
723
        <para>The tilde character (~) is expanded in file names to the
646
  name of the user's home directory.</para>
724
        name of the user's home directory.</para>
647
  
725
        
648
  <para>White space is used for separation inside  lists.
726
        <para>White space is used for separation inside  lists.
649
        Elements with embedded spaces can be quoted using
727
        Elements with embedded spaces can be quoted using
650
        double-quotes.</para>
728
        double-quotes.</para>
651
729
652
      <sect2 id="rcl.install.config.recollconf">
730
      <sect2 id="rcl.install.config.recollconf">
653
  <title>Main configuration file</title>
731
        <title>Main configuration file</title>
654
732
655
  <para><filename>~/.recoll/recoll.conf</filename> is the main
733
        <para><filename>~/.recoll/recoll.conf</filename> is the main
656
         configuration file. It defines things like
734
         configuration file. It defines things like
657
   what to index (top directories and things to ignore), and the
735
         what to index (top directories and things to ignore), and the
658
   default character set to use for document types which do not
736
         default character set to use for document types which do not
659
   specify it internally. </para>
737
         specify it internally. </para>
660
738
661
  <para>The default configuration will index your home
739
        <para>The default configuration will index your home
662
   directory. If this is not appropriate, use 
740
         directory. If this is not appropriate, use 
663
   <command>recoll</command> to copy the sample
741
         <command>recoll</command> to copy the sample
664
   configuration, click <guimenu>Cancel</guimenu>, and edit
742
         configuration, click <guimenu>Cancel</guimenu>, and edit
665
   the configuration file before restarting the command. This
743
         the configuration file before restarting the command. This
666
   will start the initial indexation, which may take some time.</para>
744
         will start the initial indexation, which may take some time.</para>
667
  
745
        
668
  <para>Paramers:</para>
746
        <para>Paramers:</para>
669
747
670
  <variablelist>
748
        <variablelist>
671
749
672
    <varlistentry><term><literal>topdirs</literal></term>
750
          <varlistentry><term><literal>topdirs</literal></term>
673
      <listitem><para>Specifies the list of directories to index
751
            <listitem><para>Specifies the list of directories to index
674
      (recursively).</para>
752
            (recursively).</para>
675
      </listitem>
753
            </listitem>
676
    </varlistentry>
754
          </varlistentry>
677
755
678
    <varlistentry><term><literal>skippedNames</literal></term>
756
          <varlistentry><term><literal>skippedNames</literal></term>
679
      <listitem>
757
            <listitem>
680
        <para>A space-separated list of patterns for
758
              <para>A space-separated list of patterns for
681
         names of files or directories that should be completely
759
               names of files or directories that should be completely
682
         ignored. The list defined in the default file is: </para>
760
               ignored. The list defined in the default file is: </para>
683
<programlisting>
761
<programlisting>
684
*~ #* bin CVS  Cache caughtspam  tmp
762
*~ #* bin CVS  Cache caughtspam  tmp
685
</programlisting>
763
</programlisting>
686
        <para>The list can be redefined for subdirectories, but is only
764
              <para>The list can be redefined for subdirectories, but is only
687
               actually changed for the top level ones in
765
               actually changed for the top level ones in
688
               <literal>topdirs</literal>.</para>
766
               <literal>topdirs</literal>.</para>
689
         <para>The top-level directories are not affected by this
767
               <para>The top-level directories are not affected by this
690
          list (that is, a directory in <literal>topdirs</literal>
768
                list (that is, a directory in <literal>topdirs</literal>
691
          might match and would still be indexed).</para>
769
                might match and would still be indexed).</para>
692
          <para>The list in the default configuration does not
770
                <para>The list in the default configuration does not
693
          exclude hidden directories (names beginning with a
771
                exclude hidden directories (names beginning with a
694
          dot), which means that it may index quite a few things
772
                dot), which means that it may index quite a few things
695
          that you do not want. On the other hand, mail user
773
                that you do not want. On the other hand, mail user
696
          agents like <application>thunderbird</application>
774
                agents like <application>thunderbird</application>
697
          usually store messages in hidden directories, and you
775
                usually store messages in hidden directories, and you
698
          probably want this indexed. One possible solution is to
776
                probably want this indexed. One possible solution is to
699
          have <userinput>.*</userinput> in
777
                have <userinput>.*</userinput> in
700
          <literal>skippedNames</literal>, and add things like
778
                <literal>skippedNames</literal>, and add things like
701
          <filename>~/.thunderbird</filename> or
779
                <filename>~/.thunderbird</filename> or
702
          <filename>~/.evolution</filename> in
780
                <filename>~/.evolution</filename> in
703
          <literal>topdirs</literal>.</para> 
781
                <literal>topdirs</literal>.</para> 
704
      </listitem>
782
            </listitem>
705
    </varlistentry>
783
          </varlistentry>
706
784
707
    <varlistentry><term><literal>loglevel</literal></term>
785
          <varlistentry><term><literal>loglevel</literal></term>
708
      <listitem><para>Verbosity level for recoll and
786
            <listitem><para>Verbosity level for recoll and
709
      recollindex. A value of 4 lists quite a lot of
787
            recollindex. A value of 4 lists quite a lot of
710
      debug/information messages. 2 only lists errors. </para>
788
            debug/information messages. 2 only lists errors. </para>
711
      </listitem>
789
            </listitem>
712
    </varlistentry>
790
          </varlistentry>
713
791
714
    <varlistentry><term><literal>logfilename</literal></term>
792
          <varlistentry><term><literal>logfilename</literal></term>
715
      <listitem><para>Where should the messages go. 'stderr' can
793
            <listitem><para>Where should the messages go. 'stderr' can
716
      be used as a special value. </para>
794
            be used as a special value. </para>
717
      </listitem>
795
            </listitem>
718
    </varlistentry>
796
          </varlistentry>
719
797
720
    <varlistentry><term><literal>filtersdir</literal></term>
798
          <varlistentry><term><literal>filtersdir</literal></term>
721
      <listitem><para>A directory to search for the external
799
            <listitem><para>A directory to search for the external
722
      filter scripts used to index some types of files. The
800
            filter scripts used to index some types of files. The
723
      value should not be changed, except if you want to modify
801
            value should not be changed, except if you want to modify
724
      one of the default scripts. The value can be redefined for
802
            one of the default scripts. The value can be redefined for
725
      any subdirectory. </para>
803
            any subdirectory. </para>
726
      </listitem>
804
            </listitem>
727
    </varlistentry>
805
          </varlistentry>
728
806
729
    <varlistentry><term><literal>indexstemminglanguages</literal></term>
807
          <varlistentry><term><literal>indexstemminglanguages</literal></term>
730
      <listitem><para>A list of languages for which the stem
808
            <listitem><para>A list of languages for which the stem
731
      expansion databases will be built. See recollindex(1) for
809
            expansion databases will be built. See recollindex(1) for
732
      possible values. You can add a stem expansion database for
810
            possible values. You can add a stem expansion database for
733
      a different language by using <command>recollindex
811
            a different language by using <command>recollindex
734
      -s</command>, but it will be deleted during the next
812
            -s</command>, but it will be deleted during the next
735
      indexation. Only languages listed in the configuration
813
            indexation. Only languages listed in the configuration
736
      file are permanent.</para>
814
            file are permanent.</para>
737
      </listitem>
815
            </listitem>
738
    </varlistentry>
816
          </varlistentry>
739
817
740
    <varlistentry><term><literal>iconsdir</literal></term>
818
          <varlistentry><term><literal>iconsdir</literal></term>
741
      <listitem><para>The name of the directory where
819
            <listitem><para>The name of the directory where
742
      <command>recoll</command> result list icons are
820
            <command>recoll</command> result list icons are
743
      stored. You can change this if you want different
821
            stored. You can change this if you want different
744
      images.</para>
822
            images.</para>
745
      </listitem>
823
            </listitem>
746
    </varlistentry>
824
          </varlistentry>
747
825
748
    <varlistentry><term><literal>dbdir</literal></term>
826
          <varlistentry><term><literal>dbdir</literal></term>
749
      <listitem><para>The name of the Xapian database
827
            <listitem><para>The name of the Xapian database
750
      directory. It will be created if needed when the database
828
            directory. It will be created if needed when the database
751
      is initialized. </para>
829
            is initialized. </para>
752
      </listitem>
830
            </listitem>
753
    </varlistentry>
831
          </varlistentry>
754
    
832
          
755
    <varlistentry><term><literal>defaultcharset</literal></term>
833
          <varlistentry><term><literal>defaultcharset</literal></term>
756
      <listitem><para>The name of the character set used for
834
            <listitem><para>The name of the character set used for
757
      files that do not contain a character set definition (ie:
835
            files that do not contain a character set definition (ie:
758
      plain text files). This can be redefined for any
836
            plain text files). This can be redefined for any
759
      subdirectory.</para> 
837
            subdirectory.</para> 
760
838
761
    <varlistentry><term><literal>guesscharset</literal></term>
839
          <varlistentry><term><literal>guesscharset</literal></term>
762
      <listitem><para>Decide if we try to guess the character
840
            <listitem><para>Decide if we try to guess the character
763
      set of files if no internal value is available (ie: for
841
            set of files if no internal value is available (ie: for
764
      plain text files). This does not work well in general, and
842
            plain text files). This does not work well in general, and
765
      should probably not be used. </para>
843
            should probably not be used. </para>
766
      </listitem>
844
            </listitem>
767
    </varlistentry>
845
          </varlistentry>
768
846
769
    <varlistentry><term><literal>usesystemfilecommand</literal></term>
847
          <varlistentry><term><literal>usesystemfilecommand</literal></term>
770
      <listitem><para>Decide if we use the <command>file -i</command>
848
            <listitem><para>Decide if we use the <command>file -i</command>
771
            system command as a final step for determining the mime
849
            system command as a final step for determining the mime
772
            type for a file (the main procedure uses suffix
850
            type for a file (the main procedure uses suffix
773
            associations as defined in the  <filename>mimemap</filename>
851
            associations as defined in the  <filename>mimemap</filename>
774
            file). This can be useful for files with suffixless names,
852
            file). This can be useful for files with suffixless names,
775
            but it will also cause the indexation of many bogus "text"
853
            but it will also cause the indexation of many bogus "text"
776
            files.</para> 
854
            files.</para> 
777
      </listitem>
855
            </listitem>
778
    </varlistentry>
856
          </varlistentry>
779
857
780
  </variablelist>
858
        </variablelist>
781
859
782
      </sect2>
860
      </sect2>
783
861
784
      <sect2 id="rclinstall.config.mimemap">
862
      <sect2 id="rclinstall.config.mimemap">
785
  <title>The mimemap file</title>
863
        <title>The mimemap file</title>
786
864
787
  <para><filename>~/.recoll/mimemap</filename> specifies the
865
        <para><filename>~/.recoll/mimemap</filename> specifies the
788
  file name extension to mime type mappings.</para> <para>For
866
        file name extension to mime type mappings.</para> <para>For
789
  file names without an extension, or with an unknown one, the
867
        file names without an extension, or with an unknown one, the
790
  system's <command>file -i</command> command will be executed
868
        system's <command>file -i</command> command will be executed
791
  to determine the mime type (this can be switched off inside
869
        to determine the mime type (this can be switched off inside
792
  the main configuration file).</para>
870
        the main configuration file).</para>
793
871
794
  <para><filename>mimemap</filename> also has a list of
872
        <para><filename>mimemap</filename> also has a list of
795
  extensions which should be ignored totally (to avoid losing
873
        extensions which should be ignored totally (to avoid losing
796
  time by executing <command>file</command> 
874
        time by executing <command>file</command> 
797
  for things that certainly should not be indexed).</para>
875
        for things that certainly should not be indexed).</para>
798
876
799
  <para>The mappings can be specified on a per-subtree basis,
877
        <para>The mappings can be specified on a per-subtree basis,
800
  which may be useful in some cases. Example:
878
        which may be useful in some cases. Example:
801
  <application>gaim</application> logs have a
879
        <application>gaim</application> logs have a
802
  <filename>.txt</filename> extension but 
880
        <filename>.txt</filename> extension but 
803
  should be handled specially, which is possible because they
881
        should be handled specially, which is possible because they
804
  are usually all located in one place.</para>
882
        are usually all located in one place.</para>
805
883
806
  <para><filename>mimemap</filename> also has a
884
        <para><filename>mimemap</filename> also has a
807
  <literal>recoll_noindex</literal> variable which is a list of
885
        <literal>recoll_noindex</literal> variable which is a list of
808
  suffixes. Matching files will be skipped (avoids unnecessary
886
        suffixes. Matching files will be skipped (avoids unnecessary
809
  decompressions or <command>file</command> executions). This is
887
        decompressions or <command>file</command> executions). This is
810
  partially redundant with <literal>skippedNames</literal> in
888
        partially redundant with <literal>skippedNames</literal> in
811
  the main configuration file, with two differences: it will not
889
        the main configuration file, with two differences: it will not
812
  affect directories, and it can be changed for any
890
        affect directories, and it can be changed for any
813
  subdirectory.</para>
891
        subdirectory.</para>
814
892
815
      </sect2>
893
      </sect2>
816
894
817
      <sect2 id="rclinstall.config.mimeconf">
895
      <sect2 id="rclinstall.config.mimeconf">
818
  <title>The mimeconf file</title>
896
        <title>The mimeconf file</title>
819
897
820
  <para><filename>~/.recoll/mimeconf</filename> specifies how the
898
        <para><filename>~/.recoll/mimeconf</filename> specifies how the
821
         different mime types are handled for indexation, and for
899
         different mime types are handled for indexation, and for
822
         display.</para>
900
         display.</para>
823
901
824
  <para>Changing the indexation parameters is probably not a
902
        <para>Changing the indexation parameters is probably not a
825
         good idea except if you are a &RCL; developper.</para>
903
         good idea except if you are a &RCL; developper.</para>
826
904
827
  <para>You may want to adjust the external viewers defined in
905
        <para>You may want to adjust the external viewers defined in
828
   (ie: html is either
906
         (ie: html is either
829
   previewed internally or displayed using 
907
         previewed internally or displayed using 
830
   <application>firefox</application>, but you may prefer 
908
         <application>firefox</application>, but you may prefer 
831
   <application>mozilla</application>...). Look for the
909
         <application>mozilla</application>...). Look for the
832
   <literal>[view]</literal> section.</para>
910
         <literal>[view]</literal> section.</para>
833
911
834
  <para>You can also change the icons which are displayed by
912
        <para>You can also change the icons which are displayed by
835
         <command>recoll</command> in the result lists (the values are
913
         <command>recoll</command> in the result lists (the values are
836
         the basenames of the png images inside the
914
         the basenames of the png images inside the
837
         <filename>iconsdir</filename> directory (specified in
915
         <filename>iconsdir</filename> directory (specified in
838
         <filename>recoll.conf</filename>).</para> 
916
         <filename>recoll.conf</filename>).</para> 
839
917