The query language processor is activated in the GUI simple search entry when the search mode selector is set to Query Language. It can also be used with the KIO slave or the command line search. It broadly has the same capabilities as the complex search interface in the GUI.
The language was based on the now defunct Xesam user search language specification.
If the results of a query language search puzzle you and you
doubt what has been actually searched for, you can use the GUI
Show Query
link at the top of the result list to
check the exact query which was finally executed by Xapian.
Here follows a sample request that we are going to explain:
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
This would search for all documents with
John Doe
appearing as a phrase in the author field (exactly what this is
would depend on the document type, ie: the
From:
header, for an email message),
and containing either beatles
or
lennon
and either
live
or
unplugged
but not
potatoes
(in any part of the document).
An element is composed of an optional field specification,
and a value, separated by a colon (the field separator is the last
colon in the element). Examples:
Eugenie
,
author:balzac
,
dc:title:grandet
dc:title:"eugenie grandet"
The colon, if present, means "contains". Xesam defines other relations, which are mostly unsupported for now (except in special cases, described further down).
All elements in the search entry are normally combined
with an implicit AND. It is possible to specify that elements be
OR'ed instead, as in Beatles
OR
Lennon
. The
OR
must be entered literally (capitals), and
it has priority over the AND associations:
word1
word2
OR
word3
means
word1
AND
(word2
OR
word3
)
not
(word1
AND
word2
) OR
word3
.
Recoll versions 1.21 and later, allow using parentheses to group elements, which will sometimes make things clearer, and may allow expressing combinations which would have been difficult otherwise.
An element preceded by a -
specifies a
term that should not appear.
As usual, words inside quotes define a phrase
(the order of words is significant), so that
title:"prejudice pride"
is not the same as
title:prejudice title:pride
, and is
unlikely to find a result.
Words inside phrases and capitalized words are not stem-expanded. Wildcards may be used anywhere inside a term. Specifying a wild-card on the left of a term can produce a very slow search (or even an incorrect one if the expansion is truncated because of excessive size). Also see More about wildcards.
To save you some typing, recent Recoll versions (1.20 and later) interpret a comma-separated list of terms as an AND list inside the field. Use slash characters ('/') for an OR list. No white space is allowed. So
author:john,lennon
will search for
documents with john
and lennon
inside the author
field (in any order), and
author:john/ringo
would search for
john
or ringo
.
Modifiers can be set on a double-quote value, for example to specify
a proximity search (unordered). See
the modifier
section. No space must separate the final
double-quote and the modifiers value, e.g. "two
one"po10
Recoll currently manages the following default fields:
title
,subject
orcaption
are synonyms which specify data to be searched for in the document title or subject.author
orfrom
for searching the documents originators.recipient
orto
for searching the documents recipients.keyword
for searching the document-specified keywords (few documents actually have any).filename
for the document's file name. This is not necessarily set for all documents: internal documents contained inside a compound one (for example an EPUB section) do not inherit the container file name any more, this was replaced by an explicit field (see next). Sub-documents can still have a specificfilename
, if it is implied by the document format, for example the attachment file name for an email attachment.containerfilename
. This is set for all documents, both top-level and contained sub-documents, and is always the name of the filesystem directory entry which contains the data. The terms from this field can only be matched by an explicit field specification (as opposed to terms fromfilename
which are also indexed as general document content). This avoids getting matches for all the sub-documents when searching for the container file name.ext
specifies the file name extension (Ex:ext:html
)
Recoll 1.20 and later have a way to specify aliases for the
field names, which will save typing, for example by aliasing
filename
to fn
or
containerfilename
to
cfn
. See the section about the
fields
file
The document input handlers used while indexing have the possibility to create other fields with arbitrary names, and aliases may be defined in the configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation.
The field syntax also supports a few field-like, but special, criteria:
dir
for filtering the results on file location (Ex:dir:/home/me/somedir
).-dir
also works to find results not in the specified directory (release >= 1.15.8). Tilde expansion will be performed as usual (except for a bug in versions 1.19 to 1.19.11p1). Wildcards will be expanded, but please have a look at an important limitation of wildcards in path filters.Relative paths also make sense, for example,
dir:share/doc
would match either/usr/share/doc
or/usr/local/share/doc
Several
dir
clauses can be specified, both positive and negative. For example the following makes sense:dir:recoll dir:src -dir:utils -dir:common
This would select results which have both
recoll
andsrc
in the path (in any order), and which have not eitherutils
orcommon
.You can also use
OR
conjunctions withdir:
clauses.A special aspect of
dir
clauses is that the values in the index are not transcoded to UTF-8, and never lower-cased or unaccented, but stored as binary. This means that you need to enter the values in the exact lower or upper case, and that searches for names with diacritics may sometimes be impossible because of character set conversion issues. Non-ASCII UNIX file paths are an unending source of trouble and are best avoided.You need to use double-quotes around the path value if it contains space characters.
size
for filtering the results on file size. Example:size<10000
. You can use<
,>
or=
as operators. You can specify a range like the following:size>100 size<1000
. The usualk/K, m/M, g/G, t/T
can be used as (decimal) multipliers. Ex:size>1k
to search for files bigger than 1000 bytes.date
for searching or filtering on dates. The syntax for the argument is based on the ISO8601 standard for dates and time intervals. Only dates are supported, no times. The general syntax is 2 elements separated by a/
character. Each element can be a date or a period of time. Periods are specified asP
n
Y
n
M
n
D
. Then
numbers are the respective numbers of years, months or days, any of which may be missing. Dates are specified asYYYY
-MM
-DD
. The days and months parts may be missing. If the/
is present but an element is missing, the missing element is interpreted as the lowest or highest date in the index. Examples:2001-03-01/2002-05-01
the basic syntax for an interval of dates.2001-03-01/P1Y2M
the same specified with a period.2001/
from the beginning of 2001 to the latest date in the index.2001
the whole year of 2001P2D/
means 2 days ago up to now if there are no documents with dates in the future./2003
all documents from 2003 or older.
Periods can also be specified with small letters (ie: p2y).
mime
orformat
for specifying the MIME type. These clauses are processed besides the normal Boolean logic of the search. Multiple values will be OR'ed (instead of the normal AND). You can specify types to be excluded, with the usual-
, and use wildcards. Example:mime:text/* -mime:text/plain
Specifying an explicit boolean operator before amime
specification is not supported and will produce strange results.type
orrclcat
for specifying the category (as in text/media/presentation/etc.). The classification of MIME types in categories is defined in the Recoll configuration (mimeconf
), and can be modified or extended. The default category names are those which permit filtering results in the main GUI screen. Categories are OR'ed like MIME types above, and can be negated with-
.
Note
mime
, rclcat
,
size
and date
criteria
always affect the whole query (they are applied as a final
filter), even if set with other terms inside a parenthese.
Note
mime
(or the equivalent
rclcat
) is the only
field with an OR
default. You do need to use
OR
with ext
terms for
example.