Switch to side-by-side view

--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -2324,32 +2324,75 @@
 	  handle the protocol.</para>
 	  </listitem>
 	</itemizedlist>
-      The following will just describe the simple filters, if you are
-      programmer enough to write one of the other kind, it shouldn't be too
-      difficult to make sense of one of the existing modules (ie:
-      rclzip).</para> 
+      The following will just describe the simple filters. If you can
+      program and want to write one of the other kind, it shouldn't be too
+      difficult to make sense of one of the existing modules. For example,
+      look at <command>rclzip</command> which uses Zip file paths as
+      internal identifiers (<literal>ipath</literal>), and
+      <command>rclinfo</command>, which uses an integer index.</para> 
+
+      <sect2 id="rcl.program.filters.simple">
+        <title>Simple filters</title>
 
       <para>&RCL; simple filters are usually shell-scripts, but this is in
-        no way necessary. These programs are extremely simple and most
-        of the difficulty lies in extracting the text from the native
-        format, not outputting what is expected by &RCL;. Happily
-        enough, most document formats already have translators or text
-        extractors which handle the difficult part and can be called
-        from the filter. In some case the output of the translating
-        program is appropriate, and no intermediate shell-script is
-        needed.</para> 
+        no way necessary. Extracting the text from the native format is the
+        difficult part. Outputting the format expected by &RCL; is
+        trivial. Happily enough, most document formats have translators or
+        text extractors which can be called from the filter. In some cases
+        the output of the translating program is completely appropriate,
+        and no intermediate shell-script is needed.</para>
 
         <para>Filters are called with a single argument which is the
         source file name. They should output the result to stdout.</para>
 
-        <para>The <literal>RECOLL_FILTER_FORPREVIEW</literal>
-        environment variable (values <literal>yes</literal>,
-        <literal>no</literal>) tells the filter if the operation is
-        for indexing or previewing. Some filters use this to output a
-        slightly different format. This is not essential.</para>
+      <para>When writing a filter, you should decide if it will output
+      plain text or html. Plain text is simpler, but you will not be able
+      to add metadata or vary the output character encoding (this will be
+      defined in a configuration file). Additionally, some formatting may
+      easier to preserve when previewing html. Actually the deciding factor
+      is metadata: &RCL; has a way to <link linkend="rcl.program.filters.html">
+      extract metadata from the html header and use it for field 
+      searches.</link>.</para>
+
+      <para>The <literal>RECOLL_FILTER_FORPREVIEW</literal> environment
+        variable (values <literal>yes</literal>, <literal>no</literal>)
+        tells the filter if the operation is for indexing or
+        previewing. Some filters use this to output a slightly different
+        format, for example stripping uninteresting repeated keywords (ie:
+        <literal>Subject:</literal> for email) when indexing. This is not
+        essential.</para>
+
+      <para>You should look to one of the simple filters, for exemple
+        <literal>rclps</literal> for a starting point.</para>
+
+        <para>Don't forget to make your filter executable before 
+         testing !</para>
+
+      </sect2>
+
+      <sect2 id="rcl.program.filters.association">
+        <title>Telling &RCL; about the filter</title>
+
+      <para>There are two elements that link a file to the filter which
+      should process it: the association of file to mime type and the
+      association of a mime type with a filter.</para>
+
+      <para>The association of files to mime types is mostly based on
+        name suffixes. The types are defined inside the
+        <link linkend="rcl.install.config.mimeconf">
+        <filename>mimemap</filename> file</link>. Example:
+<programlisting>
+
+.doc = application/msword
+</programlisting>
+       If no suffix association is found for the file name, &RCL; will try
+       to execute the <command>file -i</command> command to determine a
+       mime type.</para>
 
       <para>The association of file types to filters is performed in
-      the <filename>mimeconf</filename> file. A sample:</para>
+      the <link linkend="rcl.install.config.mimemap">
+      <filename>mimeconf</filename> file</link>. A sample will probably be
+      of better help than a long explanation:</para>
 <programlisting>
 
 [index]
@@ -2392,14 +2435,9 @@
 	      <literal>execm</literal> keyword.</para>
 	  </listitem>
 	</itemizedlist>
-      The easiest way to write a new filter is probably to start from an
-      existing one.</para> 
-
-      <para>Filters which output <literal>text/plain</literal> text
-      are generally simpler, but they cannot specify the character set
-      and other metadata, so they are limited to cases where these
-      elements are not needed.</para>
-
+       </para> 
+
+      </sect2>
 
     <sect2 id="rcl.program.filters.html">
         <title>Filter HTML output</title>