--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -24,7 +24,7 @@
Dockes</holder>
</copyright>
- <releaseinfo>$Id: usermanual.sgml,v 1.35 2007-01-15 13:03:35 dockes Exp $</releaseinfo>
+ <releaseinfo>$Id: usermanual.sgml,v 1.36 2007-01-25 15:47:45 dockes Exp $</releaseinfo>
<abstract>
<para>This document introduces full text search notions
@@ -178,7 +178,7 @@
is normally incremental: documents will only be processed if
they have been modified. On the first execution, of course, all
documents will need processing. A full index build can be forced
- later on by specifying an option to the indexing command
+ later by specifying an option to the indexing command
(<command>recollindex -z</command>).</para>
<para>&RCL; indexing can be performed with two different
@@ -486,7 +486,7 @@
</chapter>
<chapter id="rcl.search">
- <title>Search</title>
+ <title>Searching</title>
<para>The <command>recoll</command> program provides the user
interface for searching. It is based on the
@@ -510,19 +510,27 @@
</step>
</procedure>
- <para>The initial default search mode is <guilabel>Any
- term</guilabel>. This will look for documents with any of the
- search terms (the ones with more terms will get better scores).
- <guilabel>All terms</guilabel> will ensure
- that only documents with all the terms will be
- returned. <guilabel>File name</guilabel> will specifically
- look for file names, and allows using wildcards
- (<literal>*</literal>, <literal>?</literal> ,
- <literal>[]</literal>). </para>
+ <para>The initial default search mode is <guilabel>All
+ terms</guilabel>. This will look for documents containing all
+ of the search terms (the ones with more terms will get better
+ scores). <guilabel>Any term</guilabel> will search for
+ documents where at least one of the terms appear. <guilabel>File
+ name</guilabel> will specifically look for file names.</para>
+
+ <para>The fourth entry (<guilabel>Query Language</guilabel>) is
+ described in <link linkend="rcl.search.lang">its own
+ section</link>.</para>
+
+ <para>All search modes allow wildcards inside terms
+ (<literal>*</literal>, <literal>?</literal>,
+ <literal>[]</literal>). You may want to have a look at the
+ <link linkend="rcl.search.wildcards">section about wildcards</link>
+ for more information about this.</para>
<para>You can search for exact phrases (adjacent words in a
given order) by enclosing the input inside double quotes. Ex:
<literal>"virtual reality"</literal>.</para>
+
<para>Character case has no influence on search, except that you
can disable stem expansion for any term by capitalizing it. Ie:
a search for <literal>floor</literal> will also normally look for
@@ -537,7 +545,7 @@
text field). Please note, however, that only the search texts
are remembered, not the mode (all/any/file name).</para>
- <para>Typing <keycap>Esc</keycap> <keycap>Space</keycap>) while
+ <para>Typing <keycap>Esc</keycap> <keycap>Space</keycap> while
entering a word in the simple search entry will open a window
with possible completions for the word. The completions are
extracted from the database.</para>
@@ -568,7 +576,10 @@
tabs in the existing preview window. You can use
<keycap>Shift</keycap>+Click to force the creation of another
preview window, which may be useful to view the documents side
- by side.</para>
+ by side. (You can also browse successive results in a single
+ preview window by typing
+ <keycap>Shift</keycap>+<keycap>ArrowUp/Down</keycap> in the
+ window).</para>
<para>Clicking the <literal>Edit</literal> link will attempt to
start an external viewer. The viewers can be configured through the
@@ -618,19 +629,17 @@
<para>The <guilabel>Preview</guilabel> and
<guilabel>Edit</guilabel> entries do the same thing as the
- corresponding links. The two following entries will copy either
- an URL or the file path to the clipboard, for pasting into
- another application.</para>
+ corresponding links.</para>
+
+ <para>The <guilabel>Copy File Name</guilabel> and
+ <guilabel>Copy Url</guilabel> copy the relevant data to the
+ clipboard, for later pasting.</para>
<para>The <guilabel>Find similar</guilabel> entry will select
a number of relevant term from the current document and enter
them into the simple search field. You can then start a simple
search, with a good chance of finding documents related to the
current result.</para>
-
- <para>The <guilabel>Copy File Name</guilabel> and
- <guilabel>Copy Url</guilabel> copy the relevant data to the
- clipboard, for later pasting.</para>
<para>The <guilabel>Parent document</guilabel> entry will
appear for documents which are not actually files but are
@@ -653,7 +662,9 @@
<literal>Preview</literal> link inside the result list.</para>
<para>Subsequent preview requests for a given search open new
- tabs in the existing window.</para>
+ tabs in the existing window (except if you hold the
+ <keycap>Shift</keycap> key while clicking which will open a new
+ window for side by side viewing).</para>
<para>Starting another search and requesting a preview will
create a new preview window. The old one stays open until you
@@ -690,12 +701,93 @@
</sect1>
+ <sect1 id="rcl.search.lang">
+ <title>The query language</title>
+
+ <para>The query language processor is activated on the
+ simple search entry when the search mode selector is set to
+ <guilabel>Query Language</guilabel>.</para>
+
+ <para>Here follows a sample request that we are going to
+ explain:</para>
+ <programlisting>
+ mime:message/rfc822 author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
+ </programlisting>
+
+ <para>This would search for all email messages with
+ <replaceable>John Doe</replaceable>
+ appearing as a phrase in the <literal>From:</literal> header,
+ and containing either <replaceable>beatles</replaceable> or
+ <replaceable>lennon</replaceable> and either
+ <replaceable>live</replaceable> or
+ <replaceable>unplugged</replaceable> but not
+ <replaceable>potatoes</replaceable>.</para>
+
+ <para>The first element, <literal>mime:message/rfc822</literal>
+ is a special switch that restricts the results to be email
+ messages. There could be several such switches, which would form
+ a list of allowed types.</para>
+
+ <para>The second element <literal>author:"john doe"</literal> is
+ a phrase search limited to a specific field. Phrase searches are
+ specified as usual by enclosing the words in double quotes. The
+ field specification appears before the colon. &RCL; currently
+ manages the following fields:</para>
+ <itemizedlist>
+ <listitem><para><literal>title</literal>,
+ <literal>subject</literal> or <literal>caption</literal> are
+ synonyms which specify data to be searched for in the
+ document title or subject.</para>
+ </listitem>
+ <listitem><para><literal>author</literal> or
+ <literal>from</literal> for searching the documents originators.</para>
+ </listitem>
+ <listitem><para><literal>keyword</literal> for searching the
+ document specified keywords (few documents actually have any).</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>The query language is currently the only way to use the
+ &RCL; field search capability.</para>
+
+ <para>All elements in the search entry are normally combined
+ with an implicit AND. It is possible to specify that elements be
+ OR'ed instead, as in <replaceable>Beatles</replaceable>
+ <literal>OR</literal> <replaceable>Lennon</replaceable>. The
+ <literal>OR</literal> must be entered literally (capitals), and
+ it has priority over the AND associations:
+ <replaceable>word1</replaceable>
+ <replaceable>word2</replaceable> <literal>OR</literal>
+ <replaceable>word3</replaceable>
+ means
+ <replaceable>word1</replaceable> AND
+ (<replaceable>word2</replaceable> <literal>OR</literal>
+ <replaceable>word3</replaceable>)
+ not
+ (<replaceable>word1</replaceable> AND
+ <replaceable>word2</replaceable>) <literal>OR</literal>
+ <replaceable>word3</replaceable>. Do not enter explicit
+ parenthesis, they are not supported for now.</para>
+
+ <para>An entry preceded by a <literal>-</literal> specifies a
+ term that should <emphasis>not</emphasis> appear.</para>
+
+ <para>Words inside phrases and capitalized words are not
+ stem-expanded. Wildcards may be used anywhere.</para>
+
+ <para>You can use the <literal>show query</literal> link at the
+ top of the result list to check the exact query which was
+ finally executed by Xapian.</para>
+
+ </sect1>
+
<sect1 id="rcl.search.complex">
<title>Complex/advanced search</title>
- <para>The advanced search dialog has fields that will allow a more
- refined search. It has a number of entry fields, each of which
- is configurable for the following modes:
+ <para>The advanced search dialog has a number of fields that
+ will allow a more refined search. Each entry field is
+ configurable for the following modes:</para>
+
<itemizedlist>
<listitem><para>All terms.</para>
</listitem>
@@ -712,16 +804,17 @@
<listitem><para>Filename search with wildcards.</para>
</listitem>
</itemizedlist>
- </para>
+
<para>Additional entry fields can be created by clicking the
<guilabel>Add clause</guilabel> button.</para>
- <para>All relevant fields will be combined by an implicit AND
- or OR conjunction. All types of clauses except "phrase" and
- "near" can accept a mix of single words and phrases enclosed
- in double quotes. Stemming expansion will be performed for all
- terms not beginning with a capital letter, except for "phrase"
- clauses.</para>
+ <para>You can choose that all relevant fields will be combined
+ by either an AND or an OR conjunction. All types of clauses
+ except "phrase" and "near" can accept a mix of single words and
+ phrases enclosed in double quotes. Stemming expansion will be
+ performed for all terms not beginning with a capital letter,
+ except for terms inside "phrase" clauses. Wildcards will be
+ processed everywhere.</para>
<para>Advanced search will also let you search for documents of
specific mime types (ie: only <literal>text/plain</literal>, or
@@ -764,18 +857,26 @@
<varlistentry>
<term>Wildcard</term>
<listitem><para>In this mode of operation, you can enter a
- search string with shell-like wildcards (*, ?). ie:
- <replaceable>xapi*</replaceable> .</para></listitem>
+ search string with shell-like wildcards (*, ?, []). ie:
+ <replaceable>xapi*</replaceable> would display all index terms
+ beginning with <replaceable>xapi</replaceable>. (More
+ about wildcards <link
+ linkend="rcl.search.wildcards">here</link>).</para></listitem>
</varlistentry>
<varlistentry>
<term>Regular expression</term>
<listitem><para>This mode will accept a regular expression
as input. Example:
- <replaceable>word[0-9]+</replaceable> . The regular
- expression is anchored by enclosing in
- <literal>^</literal> and <literal>$</literal> before
- execution.</para></listitem>
+ <replaceable>word[0-9]+</replaceable>. The expression is
+ implicitely anchored at the beginning. Ie:
+ <replaceable>press</replaceable> will match
+ <replaceable>pression</replaceable> but not
+ <replaceable>expression</replaceable>. You can use
+ <replaceable>.*press</replaceable> to match the latter,
+ but be aware that this will cause a full index term list
+ scan, which can be quite long.</para>
+ </listitem>
</varlistentry>
<varlistentry>
@@ -815,6 +916,53 @@
</sect1>
+ <sect1 id="rcl.search.wildcards">
+ <title>More about wildcards</title>
+ <para>All words entered in &RCL; search fields will be processed
+ for wildcard expansion before the request is finally
+ executed.</para>
+
+ <para>The wildcard characters are:</para>
+
+ <itemizedlist>
+ <listitem><para><literal>*</literal> which matches 0 or more
+ characters.</para>
+ </listitem>
+ <listitem><para><literal>?</literal> which matches
+ a single character.</para>
+ </listitem>
+ <listitem><para><literal>[]</literal> which allow
+ defining sets of characters to be matched (ex:
+ <literal>[</literal><userinput>abc</userinput><literal>]</literal>
+ matches a single character which may be 'a' or 'b' or 'c',
+ <literal>[</literal><userinput>0-9</userinput><literal>]</literal>
+ matches any number.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>You should be aware of a few things before using
+ wildcards.</para>
+
+ <itemizedlist>
+ <listitem><para>Using a wildcard character at the beginning of
+ a word can make for a slow search because &RCL; will have to
+ scan the whole index term list to find the matches.</para>
+ </listitem>
+ <listitem><para>Using a <literal>*</literal> at the end of a
+ word can produce more matches than you would think, and
+ strange search results. You can use the <link
+ linkend="rcl.search.termexplorer">term explorer</link> tool to
+ check what completions exist for a given term. You can also
+ see exactly what search was performed by clicking on the link
+ at the top of the result list. In general, for natural
+ language terms, stem expansion will produce better results
+ than an ending <literal>*</literal> (stem expansion is turned
+ off when any wildcard character appears in the term).</para>
+ </listitem>
+ </itemizedlist>
+
+ </sect1>
+
<sect1 id="rcl.search.multidb">
<title>Multiple databases</title>
@@ -861,14 +1009,14 @@
<para>A typical usage scenario for the multiple index feature
would be for a system administrator to set up a central index
- for shared data, that you may choose to search, or not, in
- addition to your personal data. Of course, there are other
+ for shared data, that you choose to search or not in addition to
+ your personal data. Of course, there are other
possibilities. There are many cases where you know the subset of
- files that you want to be searched for a given query, and where
- restricting the query will much improve the precision of the
- results. This can also be performed with the directory filter in
- advanced search, but multiple indexes will have much better
- performance and may be worth the trouble.</para>
+ files that should be searched, and where narrowing the search
+ can improve the results. You can achieve approximately the same
+ effect with the directory filter in advanced search, but
+ multiple indexes will have much better performance and may be
+ worth the trouble.</para>
</sect1>
@@ -1167,10 +1315,10 @@
<filename>/usr/local/recollglobal/xapiandb</filename>).</para>
<para>Once entered, the indexes will appear in the
- <guilabel>All indexes</guilabel> list, and you can
- chose which ones you want to use at any moment by transferring
- them to/from the <guilabel>Active indexes</guilabel>
- list.</para>
+ <guilabel>External indexes</guilabel> list, and you can
+ chose which ones you want to use at any moment by checking or
+ unchecking their entries.</para>
+
<para>Your main database (the one the current configuration
indexes to), is always implicitly active. If this is not
desirable, you can set up your configuration so that it indexes,
@@ -1292,8 +1440,11 @@
</listitem>
</itemizedlist>
- <para>Text, HTML, mail folders and Openoffice files are
- processed internally.</para>
+ <para>Text, HTML, mail folders Openoffice and Scribus files
+ are processed internally. Lyx is used to index Lyx files. Many
+ filters need <command>sed</command> and <command>awk</command>.
+ </para>
+
</sect1>