Recoll index format details

A comparison of index formats for recoll 1.17 and omega 1.0.1

Recoll terms are not stemmed before being stored. They are turned to all minuscule letters with no accents. An auxiliary database handles stem expansion. Omega stores both raw terms (with prefix R) and stemmed versions (with prefix Z). The xapian-side of the information here comes from the relevant xapian-omega documentation page.

Special prefixed terms:

A comparison of prefixed term usage between Recoll and omega/xapian.

Pref.Recoll useOmega use
AAuthorSame
BUnusedReserved
CUnusedReserved
Ddate: modification date of file, like YYYYMMDDSame
EUnused. Recoll uses XE file name extension folded to lowercase
FUnusedReserved
GUnusednewGroup / forum name
HUnusedhost name
IUnused"Can see"
JUnusedReserved
KKeywordSame
LUnusedISO language code
Mmonth: YYYYMMSame
NUnusedISO country code
OUnusedOwner
PUnusedPath part of URL
QUnique Id. fs backend: trunc-hashed path+ipath Other backends may use a different unique id. Unique Id
RUnusedRaw (unstemmed) term
SSubject/titleSame
Tmime typeSame
UUnusedFull Url of indexed document. Truncated/hashed version of URL. Used for duplicate checks.
VUnused"Can't see"
WUnusedOwner
XPrefix prefix for multichar prefixes Same
Yyear YYYYSame
ZUnusedStemmed term
XEFile name extension folded as lowercase (omega uses E)Unused
XPPath elements (for phrase-based directory filtering) Unused
XSFNutf8 lowercased/unaccented version of file name. Used for specific file name searches. NOT SPLIT (spaces as normal chars).None
XTORecipientNone
XXSTNot really a prefix: start of field marker (for anchored phrase searches)None
XXNDNot really a prefix: end of field marker (for anchored phrase searches)None

Values

Value slotRecoll useOmega use
0UnusedUnix modification time
1MD5Same
2UnusedSize
10Signature: value to be checked for up-to-dateness, ie mtime|size for the fs backendUnused

Document data record format

Recoll has the same line based / prefixed data record format as omega (name=value\n). The Omega data below is quite out of date.

PrefixRecoll useOmega use
url=Full url. Always file://abspath. The path is not encoded to utf-8, this is the system file name ,usable as an argument to open()Same
mtype=mime type (omega: type)type=
fmtime=file modification datemodtime=
dmtime= document modification dateNone
origcharset= character set the text was converted fromNone
fbytes= file size in bytessize=
dbytes=document size in bytesNone
ipath=internal path for docs in multidoc filesNone
caption=title of document, utf8Same
keywords=key words, utf8None
abstract=document abstract, utf8sample=

Jean-Francois Dockes
Last modified: Sat Feb 25 09:14:38 CEST 2012