--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -20,7 +20,7 @@
</author>
<copyright>
- <year>2005</year>
+ <year>2005-2011</year>
<holder role="mailto:jfd@recoll.org">Jean-Francois
Dockes</holder>
</copyright>
@@ -197,18 +197,18 @@
<listitem>
<formalpara><title>Periodic indexing:</title>
<para>indexing takes place at discrete
- times, by executing the <command>recollindex</command>
- command. The typical usage is to have a nightly indexing run
- <link linkend="rcl.indexing.periodic.automat">programmed</link> into your
- <command>cron</command> file.</para>
+ times, by executing the <command>recollindex</command>
+ command. The typical usage is to have a nightly indexing run
+ <link linkend="rcl.indexing.periodic.automat">programmed</link>
+ into your <command>cron</command> file.</para>
</formalpara>
</listitem>
<listitem>
<formalpara><title>Real time indexing:</title>
<para>indexing takes place as soon as a file is created or
- changed. <command>recollindex</command> runs as a daemon
- and uses a file system alteration monitor such as
+ changed. <command>recollindex</command> runs as a daemon
+ and uses a file system alteration monitor such as
<application>inotify</application>,
<application>Fam</application> or
<application>Gamin</application>
@@ -218,17 +218,16 @@
</itemizedlist>
<para>The choice between the two methods is mostly a matter of
- preference, and they can be combined by setting up multiple
- indexes (ie: use periodic indexing on a big documentation
- directory, and real time indexing on a small home
- directory). Monitoring a big file system tree can consume
- significant system resources.<para>
+ preference, and they can be combined by setting up multiple
+ indexes (ie: use periodic indexing on a big documentation
+ directory, and real time indexing on a small home
+ directory). Monitoring a big file system tree can consume
+ significant system resources.<para>
<para>&RCL; knows about quite a few different document
- types. The parameters for document types recognition and
- processing are set in
- <link linkend="rcl.indexing.config">configuration files</link>.
- </para>
+ types. The parameters for document types recognition and
+ processing are set in
+ <link linkend="rcl.indexing.config">configuration files</link>.</para>
<para>Most file types, like HTML or word processing files, only hold
one document. Some file types, like mail folder files or zip
@@ -236,25 +235,24 @@
in turn be themselves compound ones. Such hierarchies can go quite
deep, and &RCL; has no problem processing, for example, an ms-word
document which would be an attachment to an email message part of
- a folder file archived inside a zip file...
- </para>
+ a folder file archived inside a zip file...</para>
<para>&RCL; indexing processes plain text, HTML, openoffice
- and e-mail files internally (a few more actually).</para>
+ and e-mail files internally (a few more actually).</para>
<para>Other file types (ie: postscript, pdf, ms-word, rtf ...)
- need external applications for preprocessing. The list is in the
- <link linkend="rcl.install.external"> installation</link>
- section. After every indexing operation, &RCL; updates a list of
- commands that would be needed for indexing existing files
- types. This list can be displayed from the
- <command>recoll</command> <guilabel>File</guilabel> menu. It is
- stored in the <filename>missing</filename> text file
- inside the configuration directory.</para>
+ need external applications for preprocessing. The list is in the
+ <link linkend="rcl.install.external"> installation</link>
+ section. After every indexing operation, &RCL; updates a list of
+ commands that would be needed for indexing existing files
+ types. This list can be displayed from the
+ <command>recoll</command> <guilabel>File</guilabel> menu. It is
+ stored in the <filename>missing</filename> text file
+ inside the configuration directory.</para>
<para>Without further configuration, &RCL; will index all
- appropriate files from your home directory, with a reasonable
- set of defaults.</para>
+ appropriate files from your home directory, with a reasonable
+ set of defaults.</para>
<para>In some cases, it may be interesting to index different
areas of the file system to separate databases. You can do this
@@ -323,19 +321,19 @@
</itemizedlist>
<para>The size of the index is determined by the document set size,
- but the ratio can vary a lot. For a typical mixed
- set of documents, the index size will often be close to
- the data set size. In specific cases (a set of compressed
- mbox files for example), the index can become much bigger than
- the documents. It may also be much smaller if the documents
- contain a lot of images or other non-indexed data (an extreme
- example being a set of mp3 files where only the tags would be
- indexed).</para>
+ but the ratio can vary a lot. For a typical mixed
+ set of documents, the index size will often be close to
+ the data set size. In specific cases (a set of compressed
+ mbox files for example), the index can become much bigger than
+ the documents. It may also be much smaller if the documents
+ contain a lot of images or other non-indexed data (an extreme
+ example being a set of mp3 files where only the tags would be
+ indexed).</para>
<para>Of course, images, sound and video do not increase the
- index size, which means that it will be quite typical nowadays
- (2006), that even a big index will be negligible against the
- total amount of data on the computer.</para>
+ index size, which means that it will be quite typical nowadays
+ (2006), that even a big index will be negligible against the
+ total amount of data on the computer.</para>
<para>The index data directory (<filename>xapiandb</filename>)
only contains data that can be completely rebuilt by an index
@@ -385,20 +383,20 @@
<title>Security aspects</title>
<para>The &RCL; index does not hold copies of the indexed
- documents. But it does hold enough data to allow for an almost
- complete reconstruction. If confidential data is indexed,
- access to the database directory should be restricted. </para>
+ documents. But it does hold enough data to allow for an almost
+ complete reconstruction. If confidential data is indexed,
+ access to the database directory should be restricted. </para>
<para>As of version 1.4, &RCL; will create the configuration
- directory with a mode of 0700 (access by owner only). As the
- index data directory is by default a sub-directory of the
- configuration directory, this should result in appropriate
- protection.</para>
+ directory with a mode of 0700 (access by owner only). As the
+ index data directory is by default a sub-directory of the
+ configuration directory, this should result in appropriate
+ protection.</para>
<para>If you use another setup, you should think of the kind
- of protection you need for your index, set the directory
- and files access modes appropriately, and also maybe adjust
- the <literal>umask</literal> used during index updates.</para>
+ of protection you need for your index, set the directory
+ and files access modes appropriately, and also maybe adjust
+ the <literal>umask</literal> used during index updates.</para>
</sect2>
@@ -409,38 +407,38 @@
<title>Indexing configuration</title>
<para>Variables set inside the
- <link linkend="rcl.install.config">&RCL; configuration files</link>
- control which areas of the file system are indexed, and how
- files are processed. These variables can be set either by
- editing the text files or using the dialogs in the
- <command>recoll</command> GUI.</para>
+ <link linkend="rcl.install.config">&RCL; configuration files</link>
+ control which areas of the file system are indexed, and how
+ files are processed. These variables can be set either by
+ editing the text files or using the dialogs in the
+ <command>recoll</command> GUI.</para>
<para>You can also use <link linkend="rcl.search.multidb">multiple
- indexes</link> defined by separate configurations, typically to
- separate personal and shared indexes, or to take advantage of
- the organization of your data to improve search precision.</para>
+ indexes</link> defined by separate configurations, typically to
+ separate personal and shared indexes, or to take advantage of
+ the organization of your data to improve search precision.</para>
<para>The first time you start <command>recoll</command>, you
- will be asked whether or not you would like it to build the
- index. If you want to adjust the configuration before indexing,
- just click <guilabel>Cancel</guilabel> at this point, which will get
- you into the configuration interface. If you exit,
- <filename>recoll</filename> will have created a ~/.recoll directory
- containing empty configuration files, which you can edit by hand.</para>
-
- <para>The configuration is documented inside the <link
- linkend="rcl.install.config">installation chapter</link> of this
- document, or in the recoll.conf(5) man page, but the most
- current information will most likely be the comments inside the
- sample file. The most immediately useful variable you may
- interested in is probably <link
- linkend="rcl.install.config.recollconf.topdirs">topdirs</link>,
- which determines what subtrees get indexed.</para>
+ will be asked whether or not you would like it to build the
+ index. If you want to adjust the configuration before indexing,
+ just click <guilabel>Cancel</guilabel> at this point, which will get
+ you into the configuration interface. If you exit,
+ <filename>recoll</filename> will have created a ~/.recoll directory
+ containing empty configuration files, which you can edit by hand.</para>
+
+ <para>The configuration is documented inside the
+ <link linkend="rcl.install.config">installation chapter</link>
+ of this document, or in the recoll.conf(5) man page, but the most
+ current information will most likely be the comments inside the
+ sample file. The most immediately useful variable you may
+ interested in is probably
+ <link linkend="rcl.install.config.recollconf.topdirs">topdirs</link>,
+ which determines what subtrees get indexed.</para>
<para>The applications needed to index file types other than
- text, HTML or email (ie: pdf, postscript, ms-word...) are
- described in the <link linkend="rcl.install.external">external
- packages section</link></para>
+ text, HTML or email (ie: pdf, postscript, ms-word...) are
+ described in the <link linkend="rcl.install.external">external
+ packages section</link></para>
<sect2 id="rcl.indexing.config.gui">
<title>The indexing configuration GUI</title>
@@ -510,7 +508,7 @@
<title>Periodic indexing</title>
<sect2 id="rcl.indexing.periodic.exec">
- <title>Starting indexing</title>
+ <title>Running indexing</title>
<para>Indexing is performed either by the
<command>recollindex</command> program, or by the
@@ -525,22 +523,22 @@
<command>recollindex</command> command:
<itemizedlist>
<listitem><para>Starting the indexing thread is more convenient,
- being just one click away.</para>
+ being just one click away.</para>
</listitem>
<listitem><para>The <command>recollindex</command> command has
- more options, especially the one to reset the index
- (<literal>-z</literal>).</para>
+ more options, especially the one to reset the index
+ (<literal>-z</literal>).</para>
</listitem>
<listitem><para>The <command>recollindex</command> command will
- not take down your GUI if it crashes (a rare occurrence, but who
- knows...)</para>
+ not take down your GUI if it crashes (a rare occurrence,
+ but who knows...)</para>
</listitem>
<listitem><para>The <command>recollindex</command> command uses
- <command>setpriority/nice</command> to lower its priority while
- indexing
- (it will also use <command>ionice</command> when this becomes
- more widely available), the thread can't do it, else it would
- also slow down the user/search interface.</para>
+ <command>setpriority/nice</command> to lower its priority while
+ indexing
+ (it will also use <command>ionice</command> when this becomes
+ more widely available), the thread can't do it, else it would
+ also slow down the user/search interface.</para>
</listitem>
</itemizedlist>
I'll let the reader decide where my heart belongs...</para>
@@ -567,7 +565,24 @@
up to date will not need to be reindexed).</para>
<para><command>recollindex</command> has a number of other options
- which are described in its man page.</para>
+ which are described in its man page.</para>
+
+ <para>Of special interest maybe are the <literal>-i</literal> and
+ <literal>-f</literal> options. <literal>-i</literal> allows
+ indexing an explicit list of files (given as command line
+ parameters or read on stdin). <literal>-f</literal> tells
+ <command>recollindex</command> to ignore file selection
+ parameters from the configuration. Together, these options allow
+ building a custom file selection process for some area of the
+ file system, by adding the top directory to the
+ <literal>skippedPaths</literal> list and using an appropriate
+ file selection method to build the file list to be fed to
+ <literal>recollindex -if</literal> .</para>
+
+ <para><literal>recollindex -i</literal> will not descend into
+ directory parameters, but just add them as index entries. It is
+ up to the external file selection method to build the complete
+ file list.</para>
</sect2>
<sect2 id="rcl.indexing.periodic.automat">