Switch to unified view

a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml
...
...
138
      languages in the same index is possible, and useful in
138
      languages in the same index is possible, and useful in
139
      practice, but does introduce possibilities of confusion. &RCL;
139
      practice, but does introduce possibilities of confusion. &RCL;
140
      currently makes no attempt at automatic language recognition.</para>
140
      currently makes no attempt at automatic language recognition.</para>
141
141
142
      <para>&RCL; has many parameters which define exactly what to
142
      <para>&RCL; has many parameters which define exactly what to
143
        index, and how to classify and decode the source
143
        index, and how to classify and decode the source documents. These
144
        documents. These are kept in <link
145
        linkend="rcl.indexing.config">configuration files</link>. A
144
        are kept in <link linkend="rcl.indexing.config">configuration
146
        default configuration is copied into a standard location
145
        files</link>. A default configuration is copied into a standard
147
        (usually something like
146
        location (usually something like
148
        <filename>/usr/[local/]share/recoll/examples</filename>)
147
        <filename>/usr/[local/]share/recoll/examples</filename>) during
149
        during installation. The default parameters from this file may
148
        installation. The default parameters from this file may be
150
        be overridden by values that you set inside your personal
149
        overridden by values that you set inside your personal
151
        configuration, found by default in the
150
        configuration, found by default in the <filename>.recoll</filename>
152
        <filename>.recoll</filename> sub-directory of your home
151
        sub-directory of your home directory. The default configuration
153
        directory. The default configuration will index your home
152
        will index your home directory with default parameters and should
154
        directory with default parameters and should be sufficient for
155
        giving &RCL; a try, but you may want to adjust it
153
        be sufficient for giving &RCL; a try, but you may want to adjust it
154
        later, which can be done either by editing the text files or by
155
        using configuration menus in the <command>recoll</command>
156
        later.</para>
156
        GUI</para>
157
157
158
      <para><link linkend="rcl.indexing.periodic.exec">Indexing</link>
158
      <para><link linkend="rcl.indexing.periodic.exec">Indexing</link>
159
      is started automatically the first time you execute the
159
      is started automatically the first time you execute the
160
      <command>recoll</command> search graphical user interface, or by
160
      <command>recoll</command> search graphical user interface, or by
161
      executing the <command>recollindex</command> command.</para>
161
      executing the <command>recollindex</command> command.</para>
...
...
182
      <title>Introduction</title>
182
      <title>Introduction</title>
183
183
184
      <para>Indexing is the process by which the set of documents is
184
      <para>Indexing is the process by which the set of documents is
185
      analyzed and the data entered into the database. &RCL; indexing
185
      analyzed and the data entered into the database. &RCL; indexing
186
      is normally incremental: documents will only be processed if
186
      is normally incremental: documents will only be processed if
187
      they have been modified. On the first execution, of course, all
187
      they have been modified. On the first execution, all
188
      documents will need processing. A full index build can be forced
188
      documents will need processing. A full index build can be forced
189
      later by specifying an option to the indexing command
189
      later by specifying an option to the indexing command
190
      (<command>recollindex -z</command>).</para> 
190
      (<command>recollindex -z</command>).</para> 
191
191
192
      <para>&RCL; indexing can be performed with two different
192
      <para>&RCL; indexing can be performed with two different
...
...
236
        deep, and &RCL; has no problem processing, for example, an ms-word
236
        deep, and &RCL; has no problem processing, for example, an ms-word
237
        document which would be an attachment to an email message part of
237
        document which would be an attachment to an email message part of
238
        a folder file archived inside a zip file...</para>
238
        a folder file archived inside a zip file...</para>
239
239
240
      <para>&RCL; indexing processes plain text, HTML, openoffice
240
      <para>&RCL; indexing processes plain text, HTML, openoffice
241
        and e-mail files internally (a few more actually).</para>
241
        and e-mail files, and a few others internally.</para>
242
242
243
      <para>Other file types (ie: postscript, pdf, ms-word, rtf ...) 
243
      <para>Other file types (ie: postscript, pdf, ms-word, rtf ...) 
244
        need external applications for preprocessing. The list is in the
244
        need external applications for preprocessing. The list is in the
245
        <link linkend="rcl.install.external"> installation</link>
245
        <link linkend="rcl.install.external"> installation</link>
246
        section. After every indexing operation, &RCL; updates a list of
246
        section. After every indexing operation, &RCL; updates a list of
...
...
340
    run, and it can always be destroyed safely.</para>
340
    run, and it can always be destroyed safely.</para>
341
341
342
      <sect2 id="rcl.indexing.storage.format">
342
      <sect2 id="rcl.indexing.storage.format">
343
        <title>Xapian index formats</title>
343
        <title>Xapian index formats</title>
344
344
345
        <para>If your first installation of &RCL; was 1.9.0 or more
345
        <para>&XAP; versions usually support several formats for index
346
          recent, you can skip this section.</para>
346
          storage. A given major &XAP; version will have a current format,
347
347
          used to create new indexes, and will also support the format from
348
        <para>&XAP; has had two possible index formats for quite some
348
          the previous major version.</para>
349
          time. The "old" one named <literal>Quartz</literal>, and the
350
          new one named <literal>Flint</literal>. &XAP; 0.9 used
351
          <literal>Quartz</literal> by default, but could use
352
          <literal>Flint</literal> if a specific environment variable
353
          (<literal>XAPIAN_PREFER_FLINT</literal>) was set. &XAP; 1.0
354
          still supports <literal>Quartz</literal> but will use
355
          <literal>Flint</literal> by default for new index
356
          creations.</para>
357
358
        <para>The number of disk accesses performed during indexing
359
          has been much optimized in the new <literal>Flint</literal>
360
          engine and you may see indexing times improved by 50% in some
361
          cases (compared to <literal>Quartz</literal>), typically for
362
          big indexes where disk accesses dominate the indexing
363
          time. There is also a more modest improvement of index
364
          size.</para>
365
349
366
        <para>&XAP; will not convert automatically an existing index
350
        <para>&XAP; will not convert automatically an existing index
367
          from the <literal>Quartz</literal> to the
351
          from the older format to the newer one. If you want to upgrade to
368
          <literal>Flint</literal> format. If you have an older index
352
          the new format, or if a very old index needs to be converted
369
          and want to take advantage of the new format (which can be
353
          because its format is not supported any more, you will have to
370
          done without setting the environment variable as of &RCL;
371
          1.8.2 and &XAP; 1.0.0), you will have to explicitly delete
372
          the old index, then run a normal indexing process.</para>
354
          explicitly delete the old index, then run a normal indexing
355
          process.</para>
373
356
374
        <para>Unfortunately, using the <literal>-z</literal> option to
357
        <para>Unfortunately, using the <literal>-z</literal> option to
375
          <command>recollindex</command> is not sufficient to change the
358
          <command>recollindex</command> is not sufficient to change the
376
          format, you have to delete all files inside the index
359
          format, you will have to delete all files inside the index
377
          directory (typically <filename>~/.recoll/xapiandb</filename>)
360
          directory (typically <filename>~/.recoll/xapiandb</filename>)
378
          before starting indexing.</para>
361
          before starting the indexing.</para>
379
362
380
      </sect2>
363
      </sect2>
381
364
382
      <sect2 id="rcl.indexing.storage.security">
365
      <sect2 id="rcl.indexing.storage.security">
383
        <title>Security aspects</title>
366
        <title>Security aspects</title>
...
...
385
        <para>The &RCL; index does not hold copies of the indexed
368
        <para>The &RCL; index does not hold copies of the indexed
386
          documents. But it does hold enough data to allow for an almost
369
          documents. But it does hold enough data to allow for an almost
387
          complete reconstruction. If confidential data is indexed,
370
          complete reconstruction. If confidential data is indexed,
388
          access to the database directory should be restricted. </para>
371
          access to the database directory should be restricted. </para>
389
372
390
        <para>As of version 1.4, &RCL; will create the configuration
373
        <para>&RCL; (since version 1.4) will create the configuration
391
          directory with a mode of 0700 (access by owner only). As the
374
          directory with a mode of 0700 (access by owner only). As the
392
          index data directory is by default a sub-directory of the
375
          index data directory is by default a sub-directory of the
393
          configuration directory, this should result in appropriate
376
          configuration directory, this should result in appropriate
394
          protection.</para> 
377
          protection.</para> 
395
378
...
...
509
492
510
      <sect2 id="rcl.indexing.periodic.exec">
493
      <sect2 id="rcl.indexing.periodic.exec">
511
        <title>Running indexing</title>
494
        <title>Running indexing</title>
512
495
513
        <para>Indexing is performed either by the
496
        <para>Indexing is performed either by the
514
          <command>recollindex</command> program, or by the
497
          <command>recollindex</command> program, or by the indexing thread
515
          indexing thread inside the <command>recoll</command>
498
          inside the <command>recoll</command> program (start it from the
516
          program (use the <guimenu>File</guimenu> menu). Both programs
499
          <guimenu>File</guimenu> menu). Both programs will use the
517
          will use the <literal>RECOLL_CONFDIR</literal>
500
          <literal>RECOLL_CONFDIR</literal> variable or accept a
518
          variable or accept a <literal>-c</literal>
501
          <literal>-c</literal> <replaceable>confdir</replaceable> option
519
          <replaceable>confdir</replaceable> option to specify a non-default
520
          configuration directory.</para>
502
          to specify a non-default configuration directory.</para>
521
503
522
        <para>Reasons to use either the indexing thread or the
504
        <para>There are reasons to use either the indexing thread or the
523
        <command>recollindex</command> command:
505
          <command>recollindex</command> command, but it is also a matter of
506
          personal preferences:
524
          <itemizedlist>
507
          <itemizedlist>
525
            <listitem><para>Starting the indexing thread is more convenient,
508
            <listitem><para>Starting the indexing thread is more convenient,
526
                being just one click away.</para>
509
                being just one click away.</para>
527
            </listitem>
510
            </listitem>
528
            <listitem><para>The <command>recollindex</command> command has
511
            <listitem><para>The <command>recollindex</command> command has
...
...
532
            <listitem><para>The <command>recollindex</command> command will
515
            <listitem><para>The <command>recollindex</command> command will
533
                not take down your GUI if it crashes (a rare occurrence,
516
                not take down your GUI if it crashes (a rare occurrence,
534
                but who knows...)</para>
517
                but who knows...)</para>
535
            </listitem>
518
            </listitem>
536
            <listitem><para>The <command>recollindex</command> command uses
519
            <listitem><para>The <command>recollindex</command> command uses
537
                <command>setpriority/nice</command> to lower its priority while
520
                <command>setpriority/nice</command> to lower its priority
538
                indexing 
521
                while indexing. When available (and for &RCL; version
539
                (it will also use <command>ionice</command> when this becomes
522
                1.16.2 and newer), it also uses the
523
                <command>ionice</command> command to lower its IO
540
                more widely available), the thread can't do it, else it would
524
                priority. The thread can't do it, else it would also slow
541
                also slow down the user/search interface.</para>
525
                down the user/search interface.</para>
542
            </listitem>
526
            </listitem>
543
          </itemizedlist>
527
          </itemizedlist>
544
          I'll let the reader decide where my heart belongs...</para>
528
        </para>
545
529
546
        <para>If the <command>recoll</command> program finds no index
530
        <para>If the <command>recoll</command> program finds no index
547
          when it starts, it will automatically start indexing (except
531
          when it starts, it will automatically start indexing (except
548
          if canceled).</para>
532
          if canceled).</para>
549
533
...
...
629
      <para>The real time indexing support can be customised during package 
613
      <para>The real time indexing support can be customised during package 
630
       <link linkend="rcl.install.building.build">configuration</link>
614
       <link linkend="rcl.install.building.build">configuration</link>
631
      with the <literal>--with[out]-fam</literal> or
615
      with the <literal>--with[out]-fam</literal> or
632
      <literal>--with[out]-inotify</literal> options.  The default is
616
      <literal>--with[out]-inotify</literal> options.  The default is
633
      currently to include inotify monitoring on systems that support
617
      currently to include inotify monitoring on systems that support
634
      it.</para>
618
      it, and, as of recoll 1.17, gamin support on FreeBSD.</para>
635
619
636
      <para>The <filename>rclmon.sh</filename> script can be used to
620
      <para>The <filename>rclmon.sh</filename> script can be used to
637
      easily start and stop the daemon. It can be found in the
621
      easily start and stop the daemon. It can be found in the
638
      <filename>examples</filename> directory (typically
622
      <filename>examples</filename> directory (typically
639
      <filename>/usr/local/[share/]recoll/examples</filename>).</para>
623
      <filename>/usr/local/[share/]recoll/examples</filename>).</para>
...
...
1309
1293
1310
    <sect2 id="rcl.search.sort">
1294
    <sect2 id="rcl.search.sort">
1311
      <title>Sorting search results and collapsing duplicates</title>
1295
      <title>Sorting search results and collapsing duplicates</title>
1312
1296
1313
      <para>The documents in a result list are normally sorted in
1297
      <para>The documents in a result list are normally sorted in
1314
        order of relevance. It is possible to specify different sort
1298
        order of relevance. It is possible to specify a different sort
1315
        parameters by using the <guimenu>Sort parameters</guimenu>
1299
        order, either by using the vertical arrows in the GUI toolbox to
1316
        dialog (located in the <guimenu>Tools</guimenu> menu).</para>
1300
        sort by date, or switching to the result table display and clicking
1317
1301
        on any header. The sort order chosen inside the result table
1318
      <para>The tool sorts a specified number of the most
1302
        remains active if you switch back to the result list, until you
1319
        relevant documents in the result list, according to specified
1303
        click one of the vertical arrows, until both are unchecked (you are
1320
        criteria. The currently available criteria are
1304
        back to sort by relevance).</para>
1321
        <emphasis>date</emphasis> and <emphasis>mime
1322
        type</emphasis>.</para>
1323
1324
      <para>The sort parameters stay in effect until they are
1325
        explicitly reset, or the program exits. An activated sort is
1326
        indicated in the result list header.</para>
1327
1305
1328
      <para>Sort parameters are remembered between program
1306
      <para>Sort parameters are remembered between program
1329
        invocations, but result sorting is normally always inactive
1307
        invocations, but result sorting is normally always inactive
1330
        when the program starts. It is possible to keep the sorting
1308
        when the program starts. It is possible to keep the sorting
1331
        activation state between program invocations by checking the
1309
        activation state between program invocations by checking the
...
...
1425
        (except <guilabel>This exact phrase</guilabel>).</para>
1403
        (except <guilabel>This exact phrase</guilabel>).</para>
1426
      </formalpara>
1404
      </formalpara>
1427
1405
1428
      <formalpara><title>AutoPhrases</title>
1406
      <formalpara><title>AutoPhrases</title>
1429
      <para>This option can be set in the preferences dialog. If it is
1407
      <para>This option can be set in the preferences dialog. If it is
1430
      set, a phrase will be automatically built and added to simple
1408
        set, a phrase will be automatically built and added to simple
1431
      searches when looking for <literal>Any terms</literal>. This
1409
        searches when looking for <literal>Any terms</literal>. This
1432
      will not change radically the results, but will give a relevance
1410
        will not change radically the results, but will give a relevance
1433
      boost to the results where the search terms appear as a
1411
        boost to the results where the search terms appear as a
1434
      phrase. Ie: searching for <literal>virtual reality</literal>
1412
        phrase. Ie: searching for <literal>virtual reality</literal>
1435
      will still find all documents where either
1413
        will still find all documents where either
1436
      <literal>virtual</literal> or <literal>reality</literal> or 
1414
        <literal>virtual</literal> or <literal>reality</literal> or 
1437
      both appear, but those which contain <literal>virtual
1415
        both appear, but those which contain <literal>virtual
1438
      reality</literal> should appear sooner in the list.</para>
1416
          reality</literal> should appear sooner in the list.</para>
1417
1418
      <para>Phrase searches can strongly slow down a query if most of the
1419
        terms in the phrase are common. This is why the
1420
        <literal>autophrase</literal> option is off by default for &RCL;
1421
        versions before 1.17. As of version 1.17,
1422
        <literal>autophrase</literal> is on by default, but very common
1423
        terms will be removed from the constructed phrase. The removal
1424
        threshold can be adjusted from the search preferences.</para>
1425
1426
      <formalpara><title>Phrases and abbreviations</title> <para>As of
1427
      &RCL; version 1.17, dotted abbreviations like
1428
      <literal>I.B.M.</literal> are also automatically indexed as a word
1429
      without the dots: <literal>IBM</literal>. Searching for the word
1430
      inside a phrase (ie: <literal>"the IBM company"</literal>) will only
1431
      match the dotted abrreviation if you increase the phrase slack (using the
1432
      advanced search panel control, or the <literal>o</literal> query
1433
      language modifier). Literal occurences of the word will be matched
1434
      normally.</para>
1435
1439
1436
1440
      </sect3>
1437
      </sect3>
1441
1438
1442
    <sect3 id="rcl.search.tips.misc">
1439
    <sect3 id="rcl.search.tips.misc">
1443
      <title>Others</title>
1440
      <title>Others</title>
...
...
3404
              <para>Example of use for skipping text files only in a
3401
              <para>Example of use for skipping text files only in a
3405
              specific directory:</para>
3402
              specific directory:</para>
3406
              <programlisting>
3403
              <programlisting>
3407
skippedPaths = ~/somedir/&lowast;.txt
3404
skippedPaths = ~/somedir/&lowast;.txt
3408
              </programlisting>
3405
              </programlisting>
3406
                <para>The values in the <literal>*skippedPaths</literal>
3407
                variables are currently matched with
3408
                <literal>fnmatch(3)</literal>, with the FNM_PATHNAME and
3409
                FNM_LEADING_DIR flags. This means that '/' characters must
3410
                be matched explicitely, which is probably
3411
                unfortunate.</para>
3412
3409
            </listitem>
3413
            </listitem>
3410
          </varlistentry>
3414
          </varlistentry>
3411
3415
3412
          <varlistentry id="rcl.install.config.recollconf.followlinks">
3416
          <varlistentry id="rcl.install.config.recollconf.followlinks">
3413
            <term><literal>followLinks</literal></term>
3417
            <term><literal>followLinks</literal></term>