Recoll index format details
A comparison of index formats for recoll 1.8 and omega 1.0.1
Recoll terms are not stemmed before being stored. They are turned to all minuscule letters with no accents. An auxiliary database handles stem expansion. Omega stores both raw terms and stemmed versions (with prefix Z)
Special prefixed terms:
A comparison of prefixed term usage between Recoll and omega/xapian. xapian-core in the Omega column means that the prefix is not used by Omega, but mentionned as allocated in the xapian prefix definition document.
Pref. | Recoll use | Omega use |
---|---|---|
T | mime type | Same |
P | Truncated/hashed version of file path. For single-document files, and for the file part of a multi-document file. Used for up-to-date checks and for retrieving a document by path. | Path part of URL (no hashing). Uses U for the equivalent term used for up to date checks. |
Q | pathhash+ipath same + internal path for documents inside multi-document files. Used to set the existence flag for subdocs when a multi-document file is found to be up to date, or for deleting all subdocs for a file, or for retrieving a document by path+ipath. Compatible with Q definition in xapian/termprefixes.txt: unique identifier. | None |
D | date: modification date of file, like YYYYMMDD | Same |
M | month: YYYYMM | Same |
Y | year YYYY | Same |
XSFN | utf8 version of file name. Used for specific file name searches | None |
U | None | Url term. Truncated/hashed version of URL. Used for duplicate checks. |
S | Subject/title | xapian-core |
A | Author | xapian-core |
K | Keyword | xapian-core |
None of the "date" terms are currently used by recoll queries
Values
Recoll currently stores no document values.
Omega stores 2 values, for the md5 hash of the file, and the last modification date (as unix time). The md5 value doesn't appear to be currently used ?
Document data record format
Recoll has the same line based / prefixed data record format as omega (name=value\n).
Prefix | Recoll use | Omega use |
---|---|---|
url= | Full url. Always file://abspath. The path is not encoded to utf-8, this is the system file name ,usable as an argument to open() | Same |
mtype= | mime type (omega: type) | type= |
fmtime= | file modification date | modtime= |
dmtime= | document modification date | None |
origcharset= | character set the text was converted from | None |
fbytes= | file size in bytes | size= |
dbytes= | document size in bytes | None |
ipath= | internal path for docs in multidoc files | None |
caption= | title of document, utf8 | Same |
keywords= | key words, utf8 | None |
abstract= | document abstract, utf8 | sample= |