Switch to side-by-side view

--- a/src/README
+++ b/src/README
@@ -134,15 +134,15 @@
 
    4. Programming interface
 
-                4.1. Writing a document filter
-
-                             4.1.1. Simple filters
-
-                             4.1.2. "Multiple" filters
-
-                             4.1.3. Telling Recoll about the filter
-
-                             4.1.4. Filter HTML output
+                4.1. Writing a document input handler
+
+                             4.1.1. Simple input handlers
+
+                             4.1.2. "Multiple" handlers
+
+                             4.1.3. Telling Recoll about the handler
+
+                             4.1.4. Input handler HTML output
 
                              4.1.5. Page numbers
 
@@ -259,7 +259,7 @@
 
    Recoll stores all internal data in Unicode UTF-8 format, and it can index
    files with different character sets, encodings, and languages into the
-   same index. It has input filters for many document types.
+   same index. It has can process many document types.
 
    Stemming is the process by which Recoll reduces words to their radicals so
    that searching does not depend, for example, on a word being singular or
@@ -418,13 +418,13 @@
 
    Excluding types can be done by adding wildcard name patterns to the
    skippedNames list, which can be done from the GUI Index configuration
-   menu. It is also possible to exclude a mime type independantly of the file
-   name by associating it with the rclnull filter. This can be done by
-   editing the mimeconf configuration file.
-
-   In order to define a positive list, You need to edit the main
-   configuration file (recoll.conf) and set the indexedmimetypes
-   configuration variable. Example:
+   menu. For versions 1.20 and later, you can alternatively set the
+   excludedmimetypes list in the configuration file. This can be redefined
+   for subdirectories.
+
+   You can also define an exclusive list of MIME types to be indexed (no
+   others will be indexed), by settting the indexedmimetypes configuration
+   variable. Example:
 
  indexedmimetypes = text/html application/pdf
           
@@ -436,10 +436,11 @@
           
 
    (When using sections like this, don't forget that they remain in effect
-   until the end of the file or another section indicator). There is no GUI
-   way to edit the parameter, because this option runs contrary to Recoll
-   main goal which is to help you find information, independantly of how it
-   may be stored.
+   until the end of the file or another section indicator).
+
+   excludedmimetypes or indexedmimetypes, can be set either by editing the
+   main configuration file (recoll.conf), or from the GUI index configuration
+   tool.
 
   2.1.4. Recovery
 
@@ -702,7 +703,7 @@
 
    mime_type
 
-           If set, this overrides any other determination of the file mime
+           If set, this overrides any other determination of the file MIME
            type.
 
    charset
@@ -1018,11 +1019,11 @@
    you prefer to completely customize the choice of applications, you can
    uncheck the Use desktop preferences option in the GUI preferences dialog,
    and click the Choose editor applications button to adjust the predefined
-   Recoll choices. The tool accepts multiple selections of mime types (e.g.
+   Recoll choices. The tool accepts multiple selections of MIME types (e.g.
    to set up the editor for the dozens of office file types).
 
    Even when Use desktop preferences is checked, there is a small list of
-   exceptions, for mime types where the Recoll choice should override the
+   exceptions, for MIME types where the Recoll choice should override the
    desktop one. These are applications which are well integrated with Recoll,
    especially evince for viewing PDF and Postscript files because of its
    support for opening the document at a specific page and passing a search
@@ -1242,7 +1243,7 @@
        specifying multiple clauses which are combined to build the search.
 
     2. The second tab lets filter the results according to file size, date of
-       modification, mime type, or location.
+       modification, MIME type, or location.
 
    Click on the Start Search button in the advanced search dialog, or type
    Enter in any text field to start the search. The button in the main window
@@ -1305,8 +1306,8 @@
        can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
        respectively.
 
-     o The next section allows filtering the results by their mime types, or
-       mime categories (ie: media/text/message/etc.).
+     o The next section allows filtering the results by their MIME types, or
+       MIME categories (ie: media/text/message/etc.).
 
        You can transfer the types between two boxes, to define which will be
        included or excluded by the search.
@@ -1647,7 +1648,7 @@
        an appropriate application.
 
      o Exceptions: when using the desktop preferences for opening documents,
-       these are mime types that will still be opened according to Recoll
+       these are MIME types that will still be opened according to Recoll
        preferences. This is useful for passing parameters like page numbers
        or search strings to applications that support them (e.g. evince).
        This cannot be done with xdg-open which only supports passing one
@@ -1789,7 +1790,7 @@
 
      o %D. Date
 
-     o %I. Icon image name. This is normally determined from the mime type.
+     o %I. Icon image name. This is normally determined from the MIME type.
        The associations are defined inside the mimeconf configuration file.
        If a thumbnail for the file is found at the standard Freedesktop
        location, this will be displayed instead.
@@ -1798,7 +1799,7 @@
 
      o %L. Precooked Preview, Edit, and possibly Snippets links
 
-     o %M. Mime type
+     o %M. MIME type
 
      o %N. result Number inside the result page
 
@@ -1824,7 +1825,7 @@
    stored by default, apart from the values above (only author and filename),
    so this feature will need some custom local configuration to be useful. An
    example candidate would be the recipient field which is generated by the
-   message filters.
+   message input handlers.
 
    The default value for the paragraph format string is:
 
@@ -1949,6 +1950,8 @@
      -m : dump the whole document meta[] array for each result
      -A : output the document abstracts
      -S fld : sort by field <fld>
+     -s stemlang : set stemming language to use (must exist in index...)
+        Use -s "" to turn off stem expansion
      -D : sort descending
      -i <dbdir> : additional index, several can be given
      -e use url encoding (%xx) for urls
@@ -2139,7 +2142,7 @@
 
        Periods can also be specified with small letters (ie: p2y).
 
-     o mime or format for specifying the mime type. This one is quite special
+     o mime or format for specifying the MIME type. This one is quite special
        because you can specify several values which will be OR'ed (the normal
        default for the language is AND). Ex: mime:text/plain mime:text/html.
        Specifying an explicit boolean operator before a mime specification is
@@ -2149,11 +2152,11 @@
        with an OR default. You do need to use OR with ext terms for example.
 
      o type or rclcat for specifying the category (as in
-       text/media/presentation/etc.). The classification of mime types in
+       text/media/presentation/etc.). The classification of MIME types in
        categories is defined in the Recoll configuration (mimeconf), and can
        be modified or extended. The default category names are those which
        permit filtering results in the main GUI screen. Categories are OR'ed
-       like mime types above. This can't be negated with - either.
+       like MIME types above. This can't be negated with - either.
 
    Words inside phrases and capitalized words are not stem-expanded.
    Wildcards may be used anywhere inside a term. Specifying a wild-card on
@@ -2161,9 +2164,9 @@
    one if the expansion is truncated because of excessive size). Also see
    More about wildcards.
 
-   The document filters used while indexing have the possibility to create
-   other fields with arbitrary names, and aliases may be defined in the
-   configuration, so that the exact field search possibilities may be
+   The document input handlers used while indexing have the possibility to
+   create other fields with arbitrary names, and aliases may be defined in
+   the configuration, so that the exact field search possibilities may be
    different for you if someone took care of the customisation.
 
   3.5.1. Modifiers
@@ -2378,81 +2381,91 @@
    Recoll has an Application Programming Interface, usable both for indexing
    and searching, currently accessible from the Python language.
 
-   Another less radical way to extend the application is to write filters for
-   new types of documents.
+   Another less radical way to extend the application is to write input
+   handlers for new types of documents.
 
    The processing of metadata attributes for documents (fields) is highly
    configurable.
 
-4.1. Writing a document filter
-
-   Recoll filters cooperate to translate from the multitude of input document
-   formats, simple ones as opendocument, acrobat), or compound ones such as
-   Zip or Email, into the final Recoll indexing input format, which may be
-   text/plain or text/html. Most filters are executable programs or scripts.
-   A few filters are coded in C++ and live inside recollindex. This latter
+4.1. Writing a document input handler
+
+  Terminology
+
+   The small programs or pieces of code which handle the processing of the
+   different document types for Recoll used to be called filters, which is
+   still reflected in the name of the directory which holds them and many
+   configuration variables. They were named this way because one of their
+   primary functions is to filter out the formatting directives and keep the
+   text content. However these modules may have other behaviours, and the
+   term input handler is now progressively substituted in the documentation.
+   filter is still used in many places though.
+
+   Recoll input handlers cooperate to translate from the multitude of input
+   document formats, simple ones as opendocument, acrobat), or compound ones
+   such as Zip or Email, into the final Recoll indexing input format, which
+   is plain text. Most input handlers are executable programs or scripts. A
+   few handlers are coded in C++ and live inside recollindex. This latter
    kind will not be described here.
 
    There are currently (1.18 and since 1.13) two kinds of external executable
-   filters:
-
-     o Simple filters (exec filters) run once and exit. They can be bare
-       programs like antiword, or scripts using other programs. They are very
-       simple to write, because they just need to print the converted
-       document to the standard output. Their output can be text/plain or
-       text/html.
-
-     o Multiple filters (execm filters), run as long as their master process
-       (recollindex) is active. They can process multiple files (sparing the
+   input handlers:
+
+     o Simple exec handlers run once and exit. They can be bare programs like
+       antiword, or scripts using other programs. They are very simple to
+       write, because they just need to print the converted document to the
+       standard output. Their output can be plain text or HTML. HTML is
+       usually preferred because it can store metadata fields and it allows
+       preserving some of the formatting for the GUI preview.
+
+     o Multiple execm handlers can process multiple files (sparing the
        process startup time which can be very significant), or multiple
        documents per file (e.g.: for zip or chm files). They communicate with
        the indexer through a simple protocol, but are nevertheless a bit more
-       complicated than the older kind. Most of new filters are written in
+       complicated than the older kind. Most of new handlers are written in
        Python, using a common module to handle the protocol. There is an
        exception, rclimg which is written in Perl. The subdocuments output by
-       these filters can be directly indexable (text or HTML), or they can be
-       other simple or compound documents that will need to be processed by
-       another filter.
-
-   In both cases, filters deal with regular file system files, and can
+       these handlers can be directly indexable (text or HTML), or they can
+       be other simple or compound documents that will need to be processed
+       by another handler.
+
+   In both cases, handlers deal with regular file system files, and can
    process either a single document, or a linear list of documents in each
    file. Recoll is responsible for performing up to date checks, deal with
    more complex embedding and other upper level issues.
 
-   In the extreme case of a simple filter returning a document in text/plain
-   format, no metadata can be transferred from the filter to the indexer.
-   Generic metadata, like document size or modification date, will be
-   gathered and stored by the indexer.
-
-   Filters that produce text/html format can return an arbitrary amount of
+   A simple handler returning a document in text/plain format, can transfer
+   no metadata to the indexer. Generic metadata, like document size or
+   modification date, will be gathered and stored by the indexer.
+
+   Handlers that produce text/html format can return an arbitrary amount of
    metadata inside HTML meta tags. These will be processed according to the
    directives found in the fields configuration file.
 
-   The filters that can handle multiple documents per file return a single
+   The handlers that can handle multiple documents per file return a single
    piece of data to identify each document inside the file. This piece of
    data, called an ipath element will be sent back by Recoll to extract the
    document at query time, for previewing, or for creating a temporary file
    to be opened by a viewer.
 
-   The following section describes the simple filters, and the next one gives
-   a few explanations about the execm ones. You could conceivably write a
-   simple filter with only the elements in the manual. This will not be the
-   case for the other ones, for which you will have to look at the code.
-
-  4.1.1. Simple filters
-
-   Recoll simple filters are usually shell-scripts, but this is in no way
+   The following section describes the simple handlers, and the next one
+   gives a few explanations about the execm ones. You could conceivably write
+   a simple handler with only the elements in the manual. This will not be
+   the case for the other ones, for which you will have to look at the code.
+
+  4.1.1. Simple input handlers
+
+   Recoll simple handlers are usually shell-scripts, but this is in no way
    necessary. Extracting the text from the native format is the difficult
    part. Outputting the format expected by Recoll is trivial. Happily enough,
    most document formats have translators or text extractors which can be
-   called from the filter. In some cases the output of the translating
+   called from the handler. In some cases the output of the translating
    program is completely appropriate, and no intermediate shell-script is
    needed.
 
-   Filters are called with a single argument which is the source file name.
-   They should output the result to stdout.
-
-   When writing a filter, you should decide if it will output plain text or
+   Input handlers are called with a single argument which is the source file
+   name. They should output the result to stdout.
+
+   When writing a handler, you should decide if it will output plain text or
    HTML. Plain text is simpler, but you will not be able to add metadata or
    vary the output character encoding (this will be defined in a
    configuration file). Additionally, some formatting may be easier to
@@ -2461,25 +2474,26 @@
    field searches..
 
    The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
-   the filter if the operation is for indexing or previewing. Some filters
+   the handler if the operation is for indexing or previewing. Some handlers
    use this to output a slightly different format, for example stripping
    uninteresting repeated keywords (ie: Subject: for email) when indexing.
    This is not essential.
 
-   You should look at one of the simple filters, for example rclps for a
+   You should look at one of the simple handlers, for example rclps for a
    starting point.
 
-   Don't forget to make your filter executable before testing !
-
-  4.1.2. "Multiple" filters
-
-   If you can program and want to write an execm filter, it should not be too
-   difficult to make sense of one of the existing modules. For example, look
-   at rclzip which uses Zip file paths as identifiers (ipath), and rclics,
-   which uses an integer index. Also have a look at the comments inside the
-   internfile/mh_execm.h file and possibly at the corresponding module.
-
-   execm filters sometimes need to make a choice for the nature of the ipath
+   Don't forget to make your handler executable before testing !
+
+  4.1.2. "Multiple" handlers
+
+   If you can program and want to write an execm handler, it should not be
+   too difficult to make sense of one of the existing modules. For example,
+   look at rclzip which uses Zip file paths as identifiers (ipath), and
+   rclics, which uses an integer index. Also have a look at the comments
+   inside the internfile/mh_execm.h file and possibly at the corresponding
+   module.
+
+   execm handlers sometimes need to make a choice for the nature of the ipath
    elements that they use in communication with the indexer. Here are a few
    guidelines:
 
@@ -2491,34 +2505,34 @@
 
      o Recoll uses a colon (:) as a separator to store a complex path
        internally (for deeper embedding). Colons inside the ipath elements
-       output by a filter will be escaped, but would be a bad choice as a
-       filter-specific separator (mostly, again, for debugging issues).
-
-   In any case, the main goal is that it should be easy for the filter to
+       output by a handler will be escaped, but would be a bad choice as a
+       handler-specific separator (mostly, again, for debugging issues).
+
+   In any case, the main goal is that it should be easy for the handler to
    extract the target document, given the file name and the ipath element.
 
-   execm filters will also produce a document with a null ipath element.
+   execm handlers will also produce a document with a null ipath element.
    Depending on the type of document, this may have some associated data
    (e.g. the body of an email message), or none (typical for an archive
    file). If it is empty, this document will be useful anyway for some
    operations, as the parent of the actual data documents.
 
-  4.1.3. Telling Recoll about the filter
-
-   There are two elements that link a file to the filter which should process
-   it: the association of file to mime type and the association of a mime
-   type with a filter.
-
-   The association of files to mime types is mostly based on name suffixes.
+  4.1.3. Telling Recoll about the handler
+
+   There are two elements that link a file to the handler which should
+   process it: the association of file to MIME type and the association of a
+   MIME type with a handler.
+
+   The association of files to MIME types is mostly based on name suffixes.
    The types are defined inside the mimemap file. Example:
 
 
  .doc = application/msword
 
    If no suffix association is found for the file name, Recoll will try to
-   execute the file -i command to determine a mime type.
-
-   The association of file types to filters is performed in the mimeconf
+   execute the file -i command to determine a MIME type.
+
+   The association of file types to handlers is performed in the mimeconf
    file. A sample will probably be of better help than a long explanation:
 
 
@@ -2545,10 +2559,10 @@
        iso-8859-1 encoding is specified because it is not the utf-8 default,
        and not output by unrtf in the HTML header section.
 
-     o application/x-chm is processed by a persistant filter. This is
+     o application/x-chm is processed by a persistant handler. This is
        determined by the execm keyword.
 
-  4.1.4. Filter HTML output
+  4.1.4. Input handler HTML output
 
    The output HTML could be very minimal like the following example:
 
@@ -2600,8 +2614,8 @@
  <meta name="date" content="2013-02-24 17:50:00">
           
 
-   Filters also have the possibility to "invent" field names. This should
-   also be output as meta tags:
+   Input handlers also have the possibility to "invent" field names. This
+   should also be output as meta tags:
 
  <meta name="somefield" content="Some textual data" />
 
@@ -2617,10 +2631,10 @@
 
   4.1.5. Page numbers
 
-   The indexer will interpret ^L characters in the filter output as
+   The indexer will interpret ^L characters in the handler output as
    indicating page breaks, and will record them. At query time, this allows
    starting a viewer on the right page for a hit or a snippet. Currently,
-   only the PDF, Postscript and DVI filters generate page breaks.
+   only the PDF, Postscript and DVI handlers generate page breaks.
 
 4.2. Field data processing
 
@@ -2628,14 +2642,14 @@
    author, abstract.
 
    The field values for documents can appear in several ways during indexing:
-   either output by filters as meta fields in the HTML header section, or
-   extracted from file extended attributes, or added as attributes of the Doc
-   object when using the API, or again synthetized internally by Recoll.
+   either output by input handlers as meta fields in the HTML header section,
+   or extracted from file extended attributes, or added as attributes of the
+   Doc object when using the API, or again synthetized internally by Recoll.
 
    The Recoll query language allows searching for text in a specific field.
 
    Recoll defines a number of default fields. Additional ones can be output
-   by filters, and described in the fields configuration file.
+   by handlers, and described in the fields configuration file.
 
    Fields can be:
 
@@ -2794,7 +2808,7 @@
 
         The Db class
 
-   A Db object is created by a connect() function and holds a connection to a
+   A Db object is created by a connect() call and holds a connection to a
    Recoll index.
 
    Methods
@@ -3088,7 +3102,7 @@
    text file inside the configuration directory.
 
    A list of common file types which need external commands follows. Many of
-   the filters need the iconv command, which is not always listed as a
+   the handlers need the iconv command, which is not always listed as a
    dependancy.
 
    Please note that, due to the relatively dynamic nature of this
@@ -3103,7 +3117,7 @@
    http://www.recoll.org/features.html if a file type is important to you.
 
    As of Recoll release 1.14, a number of XML-based formats that were handled
-   by ad hoc filter code now use the xsltproc command, which usually comes
+   by ad hoc handler code now use the xsltproc command, which usually comes
    with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
 
    Now for the list:
@@ -3121,7 +3135,7 @@
        it may be be used as a fallback for some files which antiword does not
        handle.
 
-     o MS Excel and PowerPoint need catdoc.
+     o MS Excel and PowerPoint are processed by internal Python handlers.
 
      o MS Open XML (docx) needs xsltproc.
 
@@ -3140,11 +3154,8 @@
 
      o djvu files need djvutxt and djvused from the DjVuLibre package.
 
-     o Audio files: Recoll releases before 1.13 used the id3info command from
-       the id3lib package to extract mp3 tag information, metaflac (standard
-       flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
-       Releases 1.14 and later use a single Python filter based on mutagen
-       for all audio file types.
+     o Audio files: Recoll releases 1.14 and later use a single Python
+       handler based on mutagen for all audio file types.
 
      o Pictures: Recoll uses the Exiftool Perl package to extract tag
        information. Most image file formats are supported. Note that there
@@ -3152,7 +3163,7 @@
        aperture, etc.). This is only of interest if you store personal tags
        or textual descriptions inside the image files.
 
-     o chm: files in microsoft help format need Python and the pychm module
+     o chm: files in Microsoft help format need Python and the pychm module
        (which needs chmlib).
 
      o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
@@ -3168,11 +3179,11 @@
 
      o Konqueror webarchive format with Python (uses the Tarfile module).
 
-     o mimehtml web archive format (support based on the email filter, which
+     o Mimehtml web archive format (support based on the email handler, which
        introduces some mild weirdness, but still usable).
 
    Text, HTML, email folders, and Scribus files are processed internally. Lyx
-   is used to index Lyx files. Many filters need iconv and the standard sed
+   is used to index Lyx files. Many handlers need iconv and the standard sed
    and awk.
 
 5.3. Building from source
@@ -3495,10 +3506,10 @@
 
            A space-separated list of patterns for names of files or
            directories that should be ignored inside zip archives. This is
-           used directly by the zip filter, and has a function similar to
+           used directly by the zip handler, and has a function similar to
            skippedNames, but works independantly. Can be redefined for
            filesystem subdirectories. For versions up to 1.19, you will need
-           to update the Zip filter and install a supplementary Python
+           to update the Zip handler and install a supplementary Python
            module. The details are described on the Recoll wiki.
 
    followLinks
@@ -3513,10 +3524,15 @@
    indexedmimetypes
 
            Recoll normally indexes any file which it knows how to read. This
-           list lets you restrict the indexed mime types to what you specify.
+           list lets you restrict the indexed MIME types to what you specify.
            If the variable is unspecified or the list empty (the default),
            all supported types are processed. Can be redefined for
            subdirectories.
+
+   excludedmimetypes
+
+           This list lets you exclude some MIME types from indexing. Can be
+           redefined for subdirectories.
 
    compressedfilemaxkbs
 
@@ -3550,14 +3566,14 @@
            Recoll indexes file names in a special section of the database to
            allow specific file names searches using wild cards. This
            parameter decides if file name indexing is performed only for
-           files with mime types that would qualify them for full text
+           files with MIME types that would qualify them for full text
            indexing, or for all files inside the selected subtrees,
-           independently of mime type.
+           independently of MIME type.
 
    usesystemfilecommand
 
            Decide if we use the file -i system command as a final step for
-           determining the mime type for a file (the main procedure uses
+           determining the MIME type for a file (the main procedure uses
            suffix associations as defined in the mimemap file). This can be
            useful for files with suffix-less names, but it will also cause
            the indexing of many bogus "text" files.
@@ -3770,6 +3786,9 @@
 
            This is only used by the web browser plugin indexing code, and
            defines the maximum size for the web page cache. Default: 40 MB.
+           Quite unfortunately, this is only taken into account when creating
+           the cache file. You need to delete the file for a change to be
+           taken into account.
 
    idxflushmb
 
@@ -3909,15 +3928,15 @@
 
    filtermaxseconds
 
-           Maximum filter execution time, after which it is aborted. Some
+           Maximum handler execution time, after which it is aborted. Some
            postscript programs just loop...
 
    filtersdir
 
-           A directory to search for the external filter scripts used to
-           index some types of files. The value should not be changed, except
-           if you want to modify one of the default scripts. The value can be
-           redefined for any sub-directory.
+           A directory to search for the external input handler scripts used
+           to index some types of files. The value should not be changed,
+           except if you want to modify one of the default scripts. The value
+           can be redefined for any sub-directory.
 
    iconsdir
 
@@ -3998,17 +4017,17 @@
            This section defines lists of synonyms for the canonical names
            used inside the [prefixes] and [stored] sections
 
-   filter-specific sections
-
-           Some filters may need specific configuration for handling fields.
-           Only the email message filter currently has such a section (named
-           [mail]). It allows indexing arbitrary email headers in addition to
-           the ones indexed by default. Other such sections may appear in the
-           future.
+   handler-specific sections
+
+           Some input handlers may need specific configuration for handling
+           fields. Only the email message handler currently has such a
+           section (named [mail]). It allows indexing arbitrary email headers
+           in addition to the ones indexed by default. Other such sections
+           may appear in the future.
 
    Here follows a small example of a personal fields file. This would extract
    a specific email header and use it as a searchable field, with data
-   displayable inside result lists. (Side note: as the email filter does no
+   displayable inside result lists. (Side note: as the email handler does no
    decoding on the values, only plain ascii headers can be indexed, and only
    the first occurrence will be used for headers that occur several times).
 
@@ -4040,10 +4059,10 @@
 
   5.4.3. The mimemap file
 
-   mimemap specifies the file name extension to mime type mappings.
+   mimemap specifies the file name extension to MIME type mappings.
 
    For file names without an extension, or with an unknown one, the system's
-   file -i command will be executed to determine the mime type (this can be
+   file -i command will be executed to determine the MIME type (this can be
    switched off inside the main configuration file).
 
    The mappings can be specified on a per-subtree basis, which may be useful
@@ -4064,7 +4083,7 @@
 
   5.4.4. The mimeconf file
 
-   mimeconf specifies how the different mime types are handled for indexing,
+   mimeconf specifies how the different MIME types are handled for indexing,
    and which icons are displayed in the recoll result lists.
 
    Changing the parameters in the [index] section is probably not a good idea
@@ -4088,7 +4107,7 @@
    Recoll GUI preferences, all mimeview entries will be ignored except the
    one labelled application/x-all (which is set to use xdg-open by default).
 
-   In this case, the xallexcepts top level variable defines a list of mime
+   In this case, the xallexcepts top level variable defines a list of MIME
    type exceptions which will be processed according to the local entries
    instead of being passed to the desktop. This is so that specific Recoll
    options such as a page number or a search string can be passed to
@@ -4101,13 +4120,13 @@
 
    All viewer definition entries must be placed under a [view] section.
 
-   The keys in the file are normally mime types. You can add an application
+   The keys in the file are normally MIME types. You can add an application
    tag to specialize the choice for an area of the filesystem (using a
    localfields specification in mimeconf). The syntax for the key is
    mimetype|tag
 
    The nouncompforviewmts entry, (placed at the top level, outside of the
-   [view] section), holds a list of mime types that should not be
+   [view] section), holds a list of MIME types that should not be
    uncompressed before starting the viewer (if they are found compressed, ie:
    mydoc.doc.gz).
 
@@ -4127,7 +4146,7 @@
        will not create a temporary file to extract the subdocument, expecting
        the called application (possibly a script) to be able to handle it.
 
-     o %M. Mime type
+     o %M. MIME type
 
      o %p. Page index. Only significant for a subset of document types,
        currently only PDF, Postscript and DVI files. Can be used to start the
@@ -4180,7 +4199,7 @@
 
  .blob = application/x-blobapp
 
-       Note that the mime type is made up here, and you could call it
+       Note that the MIME type is made up here, and you could call it
        diesel/oil just the same.
 
      o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
@@ -4191,7 +4210,7 @@
        would use %u if it liked URLs better.
 
    If you just wanted to change the application used by Recoll to display a
-   mime type which it already knows, you would just need to edit mimeview.
+   MIME type which it already knows, you would just need to edit mimeview.
    The entries you add in your personal file override those in the central
    configuration, which you do not need to alter. mimeview can also be
    modified from the Gui.
@@ -4213,14 +4232,14 @@
        for the files inside the result lists. Icons are normally 64x64 pixels
        PNG files which live in /usr/[local/]share/recoll/images.
 
-     o Under the [categories] section, you should add the mime type where it
+     o Under the [categories] section, you should add the MIME type where it
        makes sense (you can also create a category). Categories may be used
        for filtering in advanced search.
 
-   The rclblob filter should be an executable program or script which exists
+   The rclblob handler should be an executable program or script which exists
    inside /usr/[local/]share/recoll/filters. It will be given a file name as
    argument and should output the text or html contents on the standard
    output.
 
-   The filter programming section describes in more detail how to write a
-   filter.
+   The filter programming section describes in more detail how to write an
+   input handler.