Recoll index format details
Terms are not stemmed before being stored. They are turned to
all minuscule letters with no accents.
Special prefixed terms:
- Ddate: modification date of file, like YYYYMMDD
- Mmonth: YYYYMM
- Ppathhash truncated/hashed version of file path. For
single-document files, and for the file part of a
multi-document file. Used for up-to-date checks and for
retrieving a document by path. omega uses U for the equivalent
term used for up to date checks.
- Qpathhash+ipath same + internal path for documents inside
multi-document files. Used to set the existence flag for
subdocs when a multi-document file is found to be up to date,
or for deleting all subdocs for a file, or for retrieving a
document by path+ipath. No real omega equivalent. Compatible
with Q definition in termprefixes.txt: unique identifier.
- Tmimetype: document mime type.
- Wweak: 10 days period (not used any more by omega)
- Yyear YYYY
- XSFNfilename utf8 version of file name. Used for specific
file name searches
Omega prefixes with no equivalents in Recoll: P, R, U
None of the "date" terms are currently used by recoll queries
Values: Recoll currently stores no document values.
Document data record format
- url= Full url. Always file://abspath. The path is not
encoded to utf-8, this is the system file name ,usable as an
argument to open(). (omega: sort of same)
- mtype= mime type (omega: type)
- fmtime= file modification date (omega: modtime)
- dmtime= document modification date (omega: none)
- origcharset= character set the text was converted from
(omega: none)
- fbytes= file size in bytes (omega: size)
- dbytes= document size in bytes (omega: none)
- ipath= internal path for docs in multidoc files. (omega: none)
- caption= title of document, utf8 (omega: same)
- keywords= key words, utf8 (omega: none)
- abstract= document abstract, utf8 (omega: sample)
Jean-Francois Dockes
Last modified: Thu Dec 7 14:19:02 CET 2006