The query language

The query language processor is activated in the GUI simple search entry when the search mode selector is set to Query Language. It can also be used with the KIO slave or the command line search. It broadly has the same capabilities as the complex search interface in the GUI.

The language was based on the now defunct Xesam user search language specification.

If the results of a query language search puzzle you and you doubt what has been actually searched for, you can use the GUI Show Query link at the top of the result list to check the exact query which was finally executed by Xapian.

Here follows a sample request that we are going to explain:

          author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
      

This would search for all documents with John Doe appearing as a phrase in the author field (exactly what this is would depend on the document type, ie: the From: header, for an email message), and containing either beatles or lennon and either live or unplugged but not potatoes (in any part of the document).

An element is composed of an optional field specification, and a value, separated by a colon (the field separator is the last colon in the element). Examples: Eugenie, author:balzac, dc:title:grandet dc:title:"eugenie grandet"

The colon, if present, means "contains". Xesam defines other relations, which are mostly unsupported for now (except in special cases, described further down).

All elements in the search entry are normally combined with an implicit AND. It is possible to specify that elements be OR'ed instead, as in Beatles OR Lennon. The OR must be entered literally (capitals), and it has priority over the AND associations: word1 word2 OR word3 means word1 AND (word2 OR word3) not (word1 AND word2) OR word3.

Recoll versions 1.21 and later, allow using parentheses to group elements, which will sometimes make things clearer, and may allow expressing combinations which would have been difficult otherwise.

An element preceded by a - specifies a term that should not appear.

As usual, words inside quotes define a phrase (the order of words is significant), so that title:"prejudice pride" is not the same as title:prejudice title:pride, and is unlikely to find a result.

Words inside phrases and capitalized words are not stem-expanded. Wildcards may be used anywhere inside a term. Specifying a wild-card on the left of a term can produce a very slow search (or even an incorrect one if the expansion is truncated because of excessive size). Also see More about wildcards.

To save you some typing, recent Recoll versions (1.20 and later) interpret a comma-separated list of terms as an AND list inside the field. Use slash characters ('/') for an OR list. No white space is allowed. So

author:john,lennon

will search for documents with john and lennon inside the author field (in any order), and

author:john/ringo

would search for john or ringo.

Modifiers can be set on a double-quote value, for example to specify a proximity search (unordered). See the modifier section. No space must separate the final double-quote and the modifiers value, e.g. "two one"po10

Recoll currently manages the following default fields:

  • title, subject or caption are synonyms which specify data to be searched for in the document title or subject.

  • author or from for searching the documents originators.

  • recipient or to for searching the documents recipients.

  • keyword for searching the document-specified keywords (few documents actually have any).

  • filename for the document's file name. This is not necessarily set for all documents: internal documents contained inside a compound one (for example an EPUB section) do not inherit the container file name any more, this was replaced by an explicit field (see next). Sub-documents can still have a specific filename, if it is implied by the document format, for example the attachment file name for an email attachment.

  • containerfilename. This is set for all documents, both top-level and contained sub-documents, and is always the name of the filesystem directory entry which contains the data. The terms from this field can only be matched by an explicit field specification (as opposed to terms from filename which are also indexed as general document content). This avoids getting matches for all the sub-documents when searching for the container file name.

  • ext specifies the file name extension (Ex: ext:html)

Recoll 1.20 and later have a way to specify aliases for the field names, which will save typing, for example by aliasing filename to fn or containerfilename to cfn. See the section about the fields file

The document input handlers used while indexing have the possibility to create other fields with arbitrary names, and aliases may be defined in the configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation.

The field syntax also supports a few field-like, but special, criteria:

  • dir for filtering the results on file location (Ex: dir:/home/me/somedir). -dir also works to find results not in the specified directory (release >= 1.15.8). Tilde expansion will be performed as usual (except for a bug in versions 1.19 to 1.19.11p1). Wildcards will be expanded, but please have a look at an important limitation of wildcards in path filters.

    Relative paths also make sense, for example, dir:share/doc would match either /usr/share/doc or /usr/local/share/doc

    Several dir clauses can be specified, both positive and negative. For example the following makes sense:

    dir:recoll dir:src -dir:utils -dir:common
                

    This would select results which have both recoll and src in the path (in any order), and which have not either utils or common.

    You can also use OR conjunctions with dir: clauses.

    A special aspect of dir clauses is that the values in the index are not transcoded to UTF-8, and never lower-cased or unaccented, but stored as binary. This means that you need to enter the values in the exact lower or upper case, and that searches for names with diacritics may sometimes be impossible because of character set conversion issues. Non-ASCII UNIX file paths are an unending source of trouble and are best avoided.

    You need to use double-quotes around the path value if it contains space characters.

  • size for filtering the results on file size. Example: size<10000. You can use <, > or = as operators. You can specify a range like the following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be used as (decimal) multipliers. Ex: size>1k to search for files bigger than 1000 bytes.

  • date for searching or filtering on dates. The syntax for the argument is based on the ISO8601 standard for dates and time intervals. Only dates are supported, no times. The general syntax is 2 elements separated by a / character. Each element can be a date or a period of time. Periods are specified as PnYnMnD. The n numbers are the respective numbers of years, months or days, any of which may be missing. Dates are specified as YYYY-MM-DD. The days and months parts may be missing. If the / is present but an element is missing, the missing element is interpreted as the lowest or highest date in the index. Examples:

    • 2001-03-01/2002-05-01 the basic syntax for an interval of dates.

    • 2001-03-01/P1Y2M the same specified with a period.

    • 2001/ from the beginning of 2001 to the latest date in the index.

    • 2001 the whole year of 2001

    • P2D/ means 2 days ago up to now if there are no documents with dates in the future.

    • /2003 all documents from 2003 or older.

    Periods can also be specified with small letters (ie: p2y).

  • mime or format for specifying the MIME type. These clauses are processed besides the normal Boolean logic of the search. Multiple values will be OR'ed (instead of the normal AND). You can specify types to be excluded, with the usual -, and use wildcards. Example: mime:text/* -mime:text/plain Specifying an explicit boolean operator before a mime specification is not supported and will produce strange results.

  • type or rclcat for specifying the category (as in text/media/presentation/etc.). The classification of MIME types in categories is defined in the Recoll configuration (mimeconf), and can be modified or extended. The default category names are those which permit filtering results in the main GUI screen. Categories are OR'ed like MIME types above, and can be negated with -.

Note

mime, rclcat, size and date criteria always affect the whole query (they are applied as a final filter), even if set with other terms inside a parenthese.

Note

mime (or the equivalent rclcat) is the only field with an OR default. You do need to use OR with ext terms for example.