--- a/src/doc/user/usermanual.xml
+++ b/src/doc/user/usermanual.xml
@@ -50,18 +50,23 @@
<sect1 id="RCL.INTRODUCTION.TRYIT">
<title>Giving it a try</title>
- <para>If you do not like reading manuals (who does?) and would like
- to give &RCL; a try, just <link
- linkend="RCL.INSTALL.BINARY">install</link> the application and
- start the <command>recoll</command> graphical user interface (GUI),
- which will ask to index your home directory by default, allowing
- you to search immediately after indexing completes.</para>
+ <para>If you do not like reading manuals (who does?) but
+ wish to give &RCL; a try, just <link
+ linkend="RCL.INSTALL.BINARY">install</link> the application
+ and start the <command>recoll</command> graphical user
+ interface (GUI), which will ask permission to index your home
+ directory by default, allowing you to search immediately after
+ indexing completes.</para>
<para>Do not do this if your home directory contains a huge
number of documents and you do not want to wait or are very
short on disk space. In this case, you may first want to customize
the <link linkend="RCL.INDEXING.CONFIG">configuration</link>
- to restrict the indexed area.</para>
+ to restrict the indexed area (for the very impatient with a completed package install, from the <command>recoll</command> GUI: <menuchoice>
+ <guimenu>Preferences</guimenu>
+ <guimenuitem>Indexing configuration</guimenuitem>
+ </menuchoice>, then adjust the <guilabel>Top
+ directories</guilabel> section).</para>
<para>Also be aware that you may need to install the
appropriate <link linkend="RCL.INSTALL.EXTERNAL"> supporting
@@ -74,12 +79,12 @@
<title>Full text search</title>
<para>&RCL; is a full text search application. Full text search
- applications let you find your data by content rather
- than by external attributes (like a file name). More
- specifically, they will let you specify words (terms) that
- should or should not appear in the text you are looking for,
- and return a list of matching documents, ordered so that the
- most <emphasis>relevant</emphasis> documents will appear
+ finds your data by content rather than by external attributes
+ (like a file name). You specify words
+ (terms) which should or should not appear in the text you are
+ looking for, and receive in return a list of matching
+ documents, ordered so that the most
+ <emphasis>relevant</emphasis> documents will appear
first.</para>
<para>You do not need to remember in what file or email message you
@@ -88,27 +93,30 @@
these terms are prominent, in a similar way to Internet search
engines.</para>
- <para>A search application tries to determine which documents are
- most relevant to the search terms you provide. Computer algorithms
- for determining relevance can be very complex, and in general are
- inferior to the power of the human mind to rapidly determine
- relevance. The quality of relevance guessing is probably the most
- important aspect when evaluating a search application.</para>
-
- <para>In many cases, you are looking for all the forms of a
- word, not for a specific form or spelling. These different forms
- may include plurals, different tenses for a verb, or terms derived
- from the same root or <emphasis>stem</emphasis> (example: floor,
- floors, floored, flooring...). Search applications usually expand
- queries to all such related terms (words that reduce to the same
- stem) and also provide a way to disable this expansion if you are
- actually searching for a specific form.</para>
-
- <para>Stemming, by itself, does not accommodate for misspellings or
- phonetic searches. &RCL; supports these features through a specific
- tool (the <literal>term explorer</literal>) which will let you
- explore the set of index terms along different modes.</para>
-
+ <para>Full text search applications try to determine which
+ documents are most relevant to the search terms you
+ provide. Computer algorithms for determining relevance can be
+ very complex, and in general are inferior to the power of the
+ human mind to rapidly determine relevance. The quality of
+ relevance guessing is probably the most important aspect when
+ evaluating a search application.</para>
+
+ <para>In many cases, you are looking for all the forms of a
+ word, including plurals, different tenses for a verb, or terms
+ derived from the same root or <emphasis>stem</emphasis>
+ (example: <replaceable>floor, floors, floored,
+ flooring...</replaceable>). Queries are usually automatically
+ expanded to all such related terms (words that reduce to the
+ same stem). This can be prevented for searching for a specific
+ form.</para>
+
+ <para>Stemming, by itself, does not accommodate for misspellings
+ or phonetic searches. A full text search application may also
+ support this form of approximation. For example, a search for
+ <replaceable>aliterattion</replaceable> returning no result may
+ propose, depending on index contents, <replaceable>alliteration
+ alteration alterations altercation</replaceable> as possible
+ replacement terms. </para>
</sect1>
@@ -120,14 +128,25 @@
library as its storage and retrieval engine. &XAP; is a very
mature package using <ulink
url="http://www.xapian.org/docs/intro_ir.html">a sophisticated
- probabilistic ranking model</ulink>. &RCL; provides the mechanisms
- and interface to get data into and out of the system.</para>
-
- <para>In practice, &XAP; works by remembering where terms appear
- in your document files. The acquisition process is called
- indexing. </para>
-
- <para>The resulting index can be big (roughly the size of the
+ probabilistic ranking model</ulink>.</para>
+
+ <para>The &XAP; library manages an index database which
+ describes where terms appear in your document files. It
+ efficiently processes the complex queries which are produced by
+ the &RCL; query expansion mechanism, and is in charge of the
+ all-important relevance computation task.</para>
+
+ <para>&RCL; provides the mechanisms and interface to get data
+ into and out of the index. This includes translating the many
+ possible document formats into pure text, handling term
+ variations (using &XAP; stemmers), and spelling approximations
+ (using the <application>aspell</application> speller),
+ interpreting user queries and presenting results.</para>
+
+ <para>In a shorter way, &RCL; does the dirty footwork, &XAP;
+ deals with the intelligent parts of the process.</para>
+
+ <para>The &XAP; index can be big (roughly the size of the
original document set), but it is not a document
archive. &RCL; can only display documents that still exist at
the place from which they were indexed. (Actually, there is a
@@ -136,9 +155,12 @@
punctuation and capitalization are lost).</para>
<para>&RCL; stores all internal data in <application>Unicode
- UTF-8</application> format, and it can index files with
- different character sets, encodings, and languages into the same
- index. It has can process many document types.</para>
+ UTF-8</application> format, and it can index files of many types
+ with different character sets, encodings, and languages into the
+ same index. It can process documents embedded inside other
+ documents (for example a pdf document stored inside a Zip
+ archive sent as an email attachment...), down to an arbitrary
+ depth.</para>
<para>Stemming is the process by which &RCL; reduces words to
their radicals so that searching does not depend, for example, on a
@@ -206,9 +228,12 @@
<para>The <link linkend="RCL.INDEXING.PERIODIC.EXEC">indexing
process</link> is started automatically the first time you
- execute the <command>recoll</command> GUI. Indexing can also be
- performed by executing the <command>recollindex</command>
- command.</para>
+ execute the <command>recoll</command> GUI. Indexing can also
+ be performed by executing the <command>recollindex</command>
+ command. &RCL; indexing is multithreaded by default when
+ appropriate hardware resources are available, and can perform
+ in parallel multiple tasks among text extraction, segmentation
+ and index updates.</para>
<para><link linkend="RCL.SEARCH">Searches</link> are usually
performed inside the <command>recoll</command> GUI, which has many
@@ -220,7 +245,10 @@
<application>Python</application>
programming interface</link>, a <link linkend="RCL.SEARCH.KIO">
<application>KDE</application> KIO slave module</link>, and
- a <ulink url="&WIKI;UnityLens">Ubuntu Unity Lens</ulink> module.
+ Ubuntu Unity <ulink url="https://bitbucket.org/medoc/unity-lens-recoll">
+ Lens</ulink> (for older versions) or
+ <ulink url="https://bitbucket.org/medoc/unity-scope-recoll">
+ Scope</ulink> (for current versions) modules.
</para>
</sect1>
@@ -236,11 +264,11 @@
<para>Indexing is the process by which the set of documents is
analyzed and the data entered into the database. &RCL;
indexing is normally incremental: documents will only be
- processed if they have been modified. On the first execution,
- all documents will need processing. A full index build can be
- forced later by specifying an option to the indexing command
- (<command>recollindex</command> <option>-z</option>
- or <option>-Z</option>).</para>
+ processed if they have been modified since the last run. On
+ the first execution, all documents will need processing. A
+ full index build can be forced later by specifying an option
+ to the indexing command (<command>recollindex</command>
+ <option>-z</option> or <option>-Z</option>).</para>
<para>The following sections give an overview of different
aspects of the indexing processes and configuration, with links
@@ -1852,6 +1880,11 @@
inflections). But there are other cases where the exact search
term is not known. For example, you may not remember the exact
spelling, or only know the beginning of the name.</para>
+
+ <para>The search will only propose replacement terms with
+ spelling variations when no matching document were found. In some
+ cases, both proper spellings and mispellings are present in the
+ index, and it may be interesting to look for them explicitely.</para>
<para>The term explorer tool (started from the toolbar icon or
from the <guilabel>Term explorer</guilabel> entry of the
@@ -4636,9 +4669,11 @@
<listitem><para>Openoffice files need <command>unzip</command> and
<command>xsltproc</command>.</para></listitem>
- <listitem><para>PDF files need <command>pdftotext</command> which
- is part of the <application>Xpdf</application> or
- <application>Poppler</application> packages.</para></listitem>
+ <listitem><para>PDF files need <command>pdftotext</command>
+ which is part of <application>Poppler</application> (usually
+ comes with the <literal>poppler-utils</literal>
+ package). Avoid the original one from
+ <application>Xpdf</application>.</para></listitem>
<listitem><para>Postscript files need <command>pstotext</command>.
The original version has an issue with shell
@@ -4663,9 +4698,11 @@
<application>libwpd-tools</application> on Ubuntu)
package.</para></listitem>
- <listitem><para>RTF files need <command>unrtf</command>, which, in
- its standard version, has much trouble with non-western character
- sets. Check &RCLAPPS;.</para></listitem>
+ <listitem><para>RTF files need <command>unrtf</command>,
+ which, in its older versions, has much trouble with
+ non-western character sets. Many Linux distributions carry
+ outdated <command>unrtf</command> versions. Check
+ &RCLAPPS; for details.</para></listitem>
<listitem><para>TeX files need <command>untex</command> or
<command>detex</command>. Check &RCLAPPS; for sources if it's not