|
a/src/README |
|
b/src/README |
|
... |
|
... |
9 |
<jean-francois.dockes@wanadoo.fr>
|
9 |
<jean-francois.dockes@wanadoo.fr>
|
10 |
|
10 |
|
11 |
Copyright (c) 2005 Jean-Francois Dockes
|
11 |
Copyright (c) 2005 Jean-Francois Dockes
|
12 |
|
12 |
|
13 |
This document introduces full text search notions and describes the
|
13 |
This document introduces full text search notions and describes the
|
14 |
installation and use of the Recoll application.
|
14 |
installation and use of the Recoll application. It currently describes
|
|
|
15 |
Recoll 1.9.
|
15 |
|
16 |
|
16 |
[ Split HTML / Single HTML ]
|
17 |
[ Split HTML / Single HTML ]
|
17 |
|
18 |
|
18 |
----------------------------------------------------------------------
|
19 |
----------------------------------------------------------------------
|
19 |
|
20 |
|
|
... |
|
... |
102 |
4.4.3. The mimeconf file
|
103 |
4.4.3. The mimeconf file
|
103 |
|
104 |
|
104 |
4.4.4. The mimeview file
|
105 |
4.4.4. The mimeview file
|
105 |
|
106 |
|
106 |
4.4.5. Examples of configuration adjustments
|
107 |
4.4.5. Examples of configuration adjustments
|
|
|
108 |
|
|
|
109 |
4.5. Extending Recoll
|
|
|
110 |
|
|
|
111 |
4.5.1. Writing a document filter
|
107 |
|
112 |
|
108 |
----------------------------------------------------------------------
|
113 |
----------------------------------------------------------------------
|
109 |
|
114 |
|
110 |
Chapter 1. Introduction
|
115 |
Chapter 1. Introduction
|
111 |
|
116 |
|
|
... |
|
... |
368 |
configuration before indexing, just click Cancel at this point. That way,
|
373 |
configuration before indexing, just click Cancel at this point. That way,
|
369 |
recoll will have created a ~/.recoll directory containing empty
|
374 |
recoll will have created a ~/.recoll directory containing empty
|
370 |
configuration files.
|
375 |
configuration files.
|
371 |
|
376 |
|
372 |
The configuration is documented inside the installation chapter of this
|
377 |
The configuration is documented inside the installation chapter of this
|
373 |
document, or in the recoll.conf(5) man page. The most immediately useful
|
378 |
document, or in the recoll.conf(5) man page, but the most current
|
374 |
variable you may interested in is probably topdirs, which determines what
|
379 |
information will most likely be the comments inside the sample file. The
|
375 |
subtrees get indexed.
|
380 |
most immediately useful variable you may interested in is probably
|
|
|
381 |
topdirs, which determines what subtrees get indexed.
|
376 |
|
382 |
|
377 |
The applications needed to index file types other than text, HTML or email
|
383 |
The applications needed to index file types other than text, HTML or email
|
378 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
384 |
(ie: pdf, postscript, ms-word...) are described in the external packages
|
379 |
section
|
385 |
section
|
380 |
|
386 |
|
|
... |
|
... |
658 |
the author field (exactly what this is would depend on the document type,
|
664 |
the author field (exactly what this is would depend on the document type,
|
659 |
ie: the From: header, for an email message), and containing either beatles
|
665 |
ie: the From: header, for an email message), and containing either beatles
|
660 |
or lennon and either live or unplugged but not potatoes (in any part of
|
666 |
or lennon and either live or unplugged but not potatoes (in any part of
|
661 |
the document).
|
667 |
the document).
|
662 |
|
668 |
|
663 |
The first element author:"john doe" is a phrase search limited to a
|
|
|
664 |
specific field. Phrase searches are specified as usual by enclosing the
|
|
|
665 |
words in double quotes. The field specification appears before the colon
|
|
|
666 |
(of course this is not limited to phrases, author:Balzac would be ok too).
|
|
|
667 |
Recoll currently manages the following fields:
|
|
|
668 |
|
|
|
669 |
* title, subject or caption are synonyms which specify data to be
|
|
|
670 |
searched for in the document title or subject.
|
|
|
671 |
|
|
|
672 |
* author or from for searching the documents originators.
|
|
|
673 |
|
|
|
674 |
* keyword for searching the document specified keywords (few documents
|
|
|
675 |
actually have any).
|
|
|
676 |
|
|
|
677 |
The query language is currently the only way to use the Recoll field
|
|
|
678 |
search capability.
|
|
|
679 |
|
|
|
680 |
All elements in the search entry are normally combined with an implicit
|
669 |
All elements in the search entry are normally combined with an implicit
|
681 |
AND. It is possible to specify that elements be OR'ed instead, as in
|
670 |
AND. It is possible to specify that elements be OR'ed instead, as in
|
682 |
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
671 |
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
683 |
priority over the AND associations: word1 word2 OR word3 means word1 AND
|
672 |
priority over the AND associations: word1 word2 OR word3 means word1 AND
|
684 |
(word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
|
673 |
(word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
|
685 |
parenthesis, they are not supported for now.
|
674 |
parenthesis, they are not supported for now.
|
686 |
|
675 |
|
687 |
An entry preceded by a - specifies a term that should not appear.
|
676 |
An entry preceded by a - specifies a term that should not appear.
|
688 |
|
677 |
|
|
|
678 |
The first element in the above exemple, author:"john doe" is a phrase
|
|
|
679 |
search limited to a specific field. Phrase searches are specified as usual
|
|
|
680 |
by enclosing the words in double quotes. The field specification appears
|
|
|
681 |
before the colon (of course this is not limited to phrases, author:Balzac
|
|
|
682 |
would be ok too). Recoll currently manages the following fields:
|
|
|
683 |
|
|
|
684 |
* title, subject or caption are synonyms which specify data to be
|
|
|
685 |
searched for in the document title or subject.
|
|
|
686 |
|
|
|
687 |
* author or from for searching the documents originators.
|
|
|
688 |
|
|
|
689 |
* keyword for searching the document specified keywords (few documents
|
|
|
690 |
actually have any).
|
|
|
691 |
|
|
|
692 |
As of release 1.9, the filters have the possibility to create other fields
|
|
|
693 |
with arbitrary names. No standard filters use this possibility yet.
|
|
|
694 |
|
|
|
695 |
There are two other elements which may be specified through the field
|
|
|
696 |
syntax, but are somewhat special:
|
|
|
697 |
|
|
|
698 |
* ext for specifying the file name extension (Ex: ext:html)
|
|
|
699 |
|
|
|
700 |
* mime for specifying the mime type. This one is quite special because
|
|
|
701 |
you can specify several values which will be OR'ed (the normal default
|
|
|
702 |
for the language is AND). Ex: mime:text/plain mime:text/html.
|
|
|
703 |
Specifying an explicit boolean operator or negation (-) before a mime
|
|
|
704 |
specification is not supported and will produce strange results.
|
|
|
705 |
|
|
|
706 |
The query language is currently the only way to use the Recoll field
|
|
|
707 |
search capability.
|
|
|
708 |
|
689 |
Words inside phrases and capitalized words are not stem-expanded.
|
709 |
Words inside phrases and capitalized words are not stem-expanded.
|
690 |
Wildcards may be used anywhere.
|
710 |
Wildcards may be used anywhere inside a term. Specifying a wild-card on
|
|
|
711 |
the left of a term can produce a very slow search.
|
691 |
|
712 |
|
692 |
You can use the show query link at the top of the result list to check the
|
713 |
You can use the show query link at the top of the result list to check the
|
693 |
exact query which was finally executed by Xapian.
|
714 |
exact query which was finally executed by Xapian.
|
694 |
|
715 |
|
695 |
----------------------------------------------------------------------
|
716 |
----------------------------------------------------------------------
|
|
... |
|
... |
871 |
----------------------------------------------------------------------
|
892 |
----------------------------------------------------------------------
|
872 |
|
893 |
|
873 |
3.9. Document history
|
894 |
3.9. Document history
|
874 |
|
895 |
|
875 |
Documents that you actually view (with the internal preview or an external
|
896 |
Documents that you actually view (with the internal preview or an external
|
876 |
tool) are entered into the document history, which is remembered. You can
|
897 |
tool) are entered into the document history, which is remembered.
|
|
|
898 |
|
877 |
display the history list by using the Tools/Doc History menu entry.
|
899 |
You can display the history list by using the Tools/Doc History menu
|
|
|
900 |
entry.
|
|
|
901 |
|
|
|
902 |
You can erase the document history by using the Erase document history
|
|
|
903 |
entry in the File menu.
|
878 |
|
904 |
|
879 |
----------------------------------------------------------------------
|
905 |
----------------------------------------------------------------------
|
880 |
|
906 |
|
881 |
3.10. Sorting search results
|
907 |
3.10. Sorting search results
|
882 |
|
908 |
|
|
... |
|
... |
888 |
result list, according to specified criteria. The currently available
|
914 |
result list, according to specified criteria. The currently available
|
889 |
criteria are date and mime type.
|
915 |
criteria are date and mime type.
|
890 |
|
916 |
|
891 |
The sort parameters stay in effect until they are explicitly reset, or the
|
917 |
The sort parameters stay in effect until they are explicitly reset, or the
|
892 |
program exits. An activated sort is indicated in the result list header.
|
918 |
program exits. An activated sort is indicated in the result list header.
|
|
|
919 |
|
|
|
920 |
Sort parameters are remembered between program invocations, but result
|
|
|
921 |
sorting is normally always inactive when the program starts. It is
|
|
|
922 |
possible to keep the sorting activation state between program invocations
|
|
|
923 |
by checking the Remember sort activation state option in the preferences.
|
893 |
|
924 |
|
894 |
----------------------------------------------------------------------
|
925 |
----------------------------------------------------------------------
|
895 |
|
926 |
|
896 |
3.11. Search tips, shortcuts
|
927 |
3.11. Search tips, shortcuts
|
897 |
|
928 |
|
|
... |
|
... |
982 |
|
1013 |
|
983 |
* %A. Abstract
|
1014 |
* %A. Abstract
|
984 |
|
1015 |
|
985 |
* %D. Date
|
1016 |
* %D. Date
|
986 |
|
1017 |
|
|
|
1018 |
* %I. Icon image name
|
|
|
1019 |
|
987 |
* %K. Keywords (if any)
|
1020 |
* %K. Keywords (if any)
|
988 |
|
1021 |
|
989 |
* %L. Preview and Edit links
|
1022 |
* %L. Preview and Edit links
|
990 |
|
1023 |
|
991 |
* %M. Mime type
|
1024 |
* %M. Mime type
|
|
... |
|
... |
1000 |
|
1033 |
|
1001 |
* %U. Url
|
1034 |
* %U. Url
|
1002 |
|
1035 |
|
1003 |
The default value for the string is:
|
1036 |
The default value for the string is:
|
1004 |
|
1037 |
|
1005 |
%R %S %L <b>%T</b><br>
|
1038 |
<img src="%I" align="left">%R %S %L <b>%T</b><br>
|
1006 |
%M %D <i>%U</i><br>
|
1039 |
%M %D <i>%U</i><br>
|
1007 |
%A %K
|
1040 |
%A %K
|
1008 |
|
1041 |
|
1009 |
|
1042 |
|
1010 |
You may, for example, try the following for a more web-like
|
1043 |
You may, for example, try the following for a more web-like
|
|
... |
|
... |
1012 |
|
1045 |
|
1013 |
<u><b><a href="P%N">%T</a></b></u><br>
|
1046 |
<u><b><a href="P%N">%T</a></b></u><br>
|
1014 |
%A<font color=#008000>%U - %S</font> - %L
|
1047 |
%A<font color=#008000>%U - %S</font> - %L
|
1015 |
|
1048 |
|
1016 |
|
1049 |
|
|
|
1050 |
Or the clean looking:
|
|
|
1051 |
|
|
|
1052 |
<img src="%I" align="left">%L <font color="#900000">%R</font>
|
|
|
1053 |
<b>%T</b><br>%S
|
|
|
1054 |
<font color="#808080"><i>%U</i></font>
|
|
|
1055 |
<table bgcolor="#e0e0e0">
|
|
|
1056 |
<tr><td><div>%A</div></td></tr>
|
|
|
1057 |
</table>%K
|
|
|
1058 |
|
|
|
1059 |
|
1017 |
The format of the Preview and Edit links is <a href="Pdocnum"> and <a
|
1060 |
The format of the Preview and Edit links is <a href="Pdocnum"> and <a
|
1018 |
href="Edocnum"> where docnum is what %N would print. This makes the
|
1061 |
href="Edocnum"> where docnum is what %N would print. This makes the
|
1019 |
title a preview link in the above format.
|
1062 |
title a preview link in the above format.
|
|
|
1063 |
|
|
|
1064 |
Please note that, due to the way the program handles right mouse
|
|
|
1065 |
clicks in the result list, if the custom formatting results in
|
|
|
1066 |
multiple paragraphs per result, right clicks will only work inside the
|
|
|
1067 |
first one.
|
1020 |
|
1068 |
|
1021 |
* HTML help browser: this will let you chose your preferred browser
|
1069 |
* HTML help browser: this will let you chose your preferred browser
|
1022 |
which will be started from the Help menu to read the user manual. You
|
1070 |
which will be started from the Help menu to read the user manual. You
|
1023 |
can enter a simple name if the command is in your PATH, or browse for
|
1071 |
can enter a simple name if the command is in your PATH, or browse for
|
1024 |
a full pathname.
|
1072 |
a full pathname.
|
1025 |
|
|
|
1026 |
* Show document type icons in result list: icons in the result list can
|
|
|
1027 |
be turned off. They take quite a lot of space and convey relatively
|
|
|
1028 |
little useful information.
|
|
|
1029 |
|
1073 |
|
1030 |
* Auto-start simple search on white space entry: if this is checked, a
|
1074 |
* Auto-start simple search on white space entry: if this is checked, a
|
1031 |
search will be executed each time you enter a space in the simple
|
1075 |
search will be executed each time you enter a space in the simple
|
1032 |
search input field. This lets you look at the result list as you enter
|
1076 |
search input field. This lets you look at the result list as you enter
|
1033 |
new terms. This is off by default, you may like it or not...
|
1077 |
new terms. This is off by default, you may like it or not...
|
|
... |
|
... |
1084 |
|
1128 |
|
1085 |
Chapter 4. Installation
|
1129 |
Chapter 4. Installation
|
1086 |
|
1130 |
|
1087 |
4.1. Installing a prebuilt copy
|
1131 |
4.1. Installing a prebuilt copy
|
1088 |
|
1132 |
|
1089 |
Recoll binary installations are always linked statically to the xapian
|
1133 |
Recoll binary packages from the Recoll web site are always linked
|
1090 |
libraries, and have no other dependencies. You will only have to check or
|
1134 |
statically to the Xapian libraries, and have no other dependencies. You
|
1091 |
install supporting applications for the file types that you want to index
|
1135 |
will only have to check or install supporting applications for the file
|
1092 |
beyond text, HTML and mail files.
|
1136 |
types that you want to index beyond text, HTML and mail files, and maybe
|
|
|
1137 |
have a look at the configuration section (but this may not be necessary
|
|
|
1138 |
for a quick test with default parameters).
|
1093 |
|
1139 |
|
1094 |
----------------------------------------------------------------------
|
1140 |
----------------------------------------------------------------------
|
1095 |
|
1141 |
|
1096 |
4.1.1. Installing through a package system
|
1142 |
4.1.1. Installing through a package system
|
1097 |
|
1143 |
|
1098 |
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
1144 |
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
1099 |
just follow the usual procedure, and maybe have a look at the
|
1145 |
just follow the usual procedure for your system.
|
1100 |
configuration section (but this may not be necessary for a quick test with
|
|
|
1101 |
default parameters).
|
|
|
1102 |
|
1146 |
|
1103 |
----------------------------------------------------------------------
|
1147 |
----------------------------------------------------------------------
|
1104 |
|
1148 |
|
1105 |
4.1.2. Installing a prebuilt Recoll
|
1149 |
4.1.2. Installing a prebuilt Recoll
|
1106 |
|
1150 |
|
1107 |
The unpackaged binary versions are just compressed tar files of a build
|
1151 |
The unpackaged binary versions on the Recoll web site are just compressed
|
1108 |
tree, where only the useful parts were kept (executables and sample
|
1152 |
tar files of a build tree, where only the useful parts were kept
|
1109 |
configuration).
|
1153 |
(executables and sample configuration).
|
1110 |
|
1154 |
|
1111 |
The executable binary files are built with a static link to libxapian and
|
1155 |
The executable binary files are built with a static link to libxapian and
|
1112 |
libiconv, to make installation easier (no dependencies). However, this
|
1156 |
libiconv, to make installation easier (no dependencies).
|
1113 |
also means that you cannot change the versions which are used.
|
|
|
1114 |
|
1157 |
|
1115 |
After extracting the tar file, you can proceed with installation as if you
|
1158 |
After extracting the tar file, you can proceed with installation as if you
|
1116 |
had built the package from source (that is, just type make install). The
|
1159 |
had built the package from source (that is, just type make install). The
|
1117 |
binary trees are built for installation to /usr/local.
|
1160 |
binary trees are built for installation to /usr/local.
|
1118 |
|
1161 |
|
1119 |
You may then need to install external applications to process some file
|
|
|
1120 |
types that you want indexed (ie: acrobat, postscript ...). See next
|
|
|
1121 |
section.
|
|
|
1122 |
|
|
|
1123 |
Finally, you may want to have a look at the configuration section.
|
|
|
1124 |
|
|
|
1125 |
----------------------------------------------------------------------
|
1162 |
----------------------------------------------------------------------
|
1126 |
|
1163 |
|
1127 |
4.2. Supporting packages
|
1164 |
4.2. Supporting packages
|
1128 |
|
1165 |
|
1129 |
Recoll uses external applications to index some file types. You need to
|
1166 |
Recoll uses external applications to index some file types. You need to
|
|
... |
|
... |
1159 |
4.3. Building from source
|
1196 |
4.3. Building from source
|
1160 |
|
1197 |
|
1161 |
4.3.1. Prerequisites
|
1198 |
4.3.1. Prerequisites
|
1162 |
|
1199 |
|
1163 |
At the very least, you will need to download and install the xapian core
|
1200 |
At the very least, you will need to download and install the xapian core
|
1164 |
package (Recoll development currently uses version 0.9.5), and the qt
|
1201 |
package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
|
1165 |
run-time and development packages (Recoll development currently uses
|
1202 |
version will work too), and the qt run-time and development packages
|
1166 |
version 3.3.5, but any 3.3 version is probably OK).
|
1203 |
(Recoll development currently uses version 3.3.5, but any 3.3 version is
|
|
|
1204 |
probably OK).
|
1167 |
|
1205 |
|
1168 |
You will most probably be able to find a binary package for qt for your
|
1206 |
You will most probably be able to find a binary package for qt for your
|
1169 |
system. You may have to compile Xapian but this is not difficult (if you
|
1207 |
system. You may have to compile Xapian but this is not difficult (if you
|
1170 |
are using FreeBSD, there is a port).
|
1208 |
are using FreeBSD, there is a port).
|
1171 |
|
1209 |
|
|
... |
|
... |
1176 |
----------------------------------------------------------------------
|
1214 |
----------------------------------------------------------------------
|
1177 |
|
1215 |
|
1178 |
4.3.2. Building
|
1216 |
4.3.2. Building
|
1179 |
|
1217 |
|
1180 |
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
1218 |
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
1181 |
3/4/5), FreeBSD and Solaris 8. If you build on another system, I would
|
1219 |
3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
|
1182 |
very much welcome patches.
|
1220 |
system, and need to modify things, I would very much welcome patches.
|
1183 |
|
1221 |
|
1184 |
Depending on the qt configuration on your system, you may have to set the
|
1222 |
Depending on the qt configuration on your system, you may have to set the
|
1185 |
QTDIR and QMAKESPECS variables in your environment:
|
1223 |
QTDIR and QMAKESPECS variables in your environment:
|
1186 |
|
1224 |
|
1187 |
* QTDIR should point to the directory above the one that holds the qt
|
1225 |
* QTDIR should point to the directory above the one that holds the qt
|
|
... |
|
... |
1368 |
|
1406 |
|
1369 |
Where the messages should go. 'stderr' can be used as a special
|
1407 |
Where the messages should go. 'stderr' can be used as a special
|
1370 |
value, and is the default. The daemversion is specific to the
|
1408 |
value, and is the default. The daemversion is specific to the
|
1371 |
indexing monitor daemon.
|
1409 |
indexing monitor daemon.
|
1372 |
|
1410 |
|
1373 |
filtersdir
|
|
|
1374 |
|
|
|
1375 |
A directory to search for the external filter scripts used to
|
|
|
1376 |
index some types of files. The value should not be changed, except
|
|
|
1377 |
if you want to modify one of the default scripts. The value can be
|
|
|
1378 |
redefined for any sub-directory.
|
|
|
1379 |
|
|
|
1380 |
indexstemminglanguages
|
1411 |
indexstemminglanguages
|
1381 |
|
1412 |
|
1382 |
A list of languages for which the stem expansion databases will be
|
1413 |
A list of languages for which the stem expansion databases will be
|
1383 |
built. See recollindex(1) for possible values. You can add a stem
|
1414 |
built. See recollindex(1) or use the recollindex -l command for
|
1384 |
expansion database for a different language by using recollindex
|
1415 |
possible values. You can add a stem expansion database for a
|
1385 |
-s, but it will be deleted during the next indexing. Only
|
1416 |
different language by using recollindex -s, but it will be deleted
|
|
|
1417 |
during the next indexing. Only languages listed in the
|
1386 |
languages listed in the configuration file are permanent.
|
1418 |
configuration file are permanent.
|
1387 |
|
1419 |
|
1388 |
defaultcharset
|
1420 |
defaultcharset
|
1389 |
|
1421 |
|
1390 |
The name of the character set used for files that do not contain a
|
1422 |
The name of the character set used for files that do not contain a
|
1391 |
character set definition (ie: plain text files). This can be
|
1423 |
character set definition (ie: plain text files). This can be
|
1392 |
redefined for any sub-directory. If it is not set at all, the
|
1424 |
redefined for any sub-directory. If it is not set at all, the
|
1393 |
character set used is the one defined by the nls environment
|
1425 |
character set used is the one defined by the nls environment
|
1394 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
1426 |
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
|
|
1427 |
|
|
|
1428 |
maxfsoccuppc
|
|
|
1429 |
|
|
|
1430 |
Maximum file system occupation before we stop indexing. The value
|
|
|
1431 |
is a percentage, corresponding to what the "Capacity" df output
|
|
|
1432 |
column shows. The default value is 0, meaning no checking.
|
|
|
1433 |
|
|
|
1434 |
idxflushmb
|
|
|
1435 |
|
|
|
1436 |
Threshold (megabytes of new text data) where we flush from memory
|
|
|
1437 |
to disk index. Setting this can help control memory usage. A value
|
|
|
1438 |
of 0 means no explicit flushing, letting Xapian use its own
|
|
|
1439 |
default, which is flushing every 10000 documents (memory usage
|
|
|
1440 |
depends on average document size). The default value is 10.
|
|
|
1441 |
|
|
|
1442 |
filtersdir
|
|
|
1443 |
|
|
|
1444 |
A directory to search for the external filter scripts used to
|
|
|
1445 |
index some types of files. The value should not be changed, except
|
|
|
1446 |
if you want to modify one of the default scripts. The value can be
|
|
|
1447 |
redefined for any sub-directory.
|
|
|
1448 |
|
|
|
1449 |
iconsdir
|
|
|
1450 |
|
|
|
1451 |
The name of the directory where recoll result list icons are
|
|
|
1452 |
stored. You can change this if you want different images.
|
1395 |
|
1453 |
|
1396 |
guesscharset
|
1454 |
guesscharset
|
1397 |
|
1455 |
|
1398 |
Decide if we try to guess the character set of files if no
|
1456 |
Decide if we try to guess the character set of files if no
|
1399 |
internal value is available (ie: for plain text files). This does
|
1457 |
internal value is available (ie: for plain text files). This does
|
|
... |
|
... |
1422 |
database. This is so that they can be displayed inside the result
|
1480 |
database. This is so that they can be displayed inside the result
|
1423 |
lists without decoding the original file. This parameter defines
|
1481 |
lists without decoding the original file. This parameter defines
|
1424 |
the size of the stored abstract (which can come from an actual
|
1482 |
the size of the stored abstract (which can come from an actual
|
1425 |
section or just be the beginning of the text). The default value
|
1483 |
section or just be the beginning of the text). The default value
|
1426 |
is 250.
|
1484 |
is 250.
|
1427 |
|
|
|
1428 |
iconsdir
|
|
|
1429 |
|
|
|
1430 |
The name of the directory where recoll result list icons are
|
|
|
1431 |
stored. You can change this if you want different images.
|
|
|
1432 |
|
1485 |
|
1433 |
aspellLanguage
|
1486 |
aspellLanguage
|
1434 |
|
1487 |
|
1435 |
Language definitions to use when creating the aspell dictionary.
|
1488 |
Language definitions to use when creating the aspell dictionary.
|
1436 |
The value must match a set of aspell language definition files.
|
1489 |
The value must match a set of aspell language definition files.
|
|
... |
|
... |
1569 |
The rclblob filter should be an executable program or script which exists
|
1622 |
The rclblob filter should be an executable program or script which exists
|
1570 |
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
1623 |
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
1571 |
argument and should output the text contents in html format on the
|
1624 |
argument and should output the text contents in html format on the
|
1572 |
standard output.
|
1625 |
standard output.
|
1573 |
|
1626 |
|
|
|
1627 |
You can find more details about writing a Recoll filter in the section
|
|
|
1628 |
about writing filters
|
|
|
1629 |
|
|
|
1630 |
----------------------------------------------------------------------
|
|
|
1631 |
|
|
|
1632 |
4.5. Extending Recoll
|
|
|
1633 |
|
|
|
1634 |
4.5.1. Writing a document filter
|
|
|
1635 |
|
|
|
1636 |
Recoll filters are executable programs which translate from a specific
|
|
|
1637 |
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
|
|
1638 |
format, which was chosen to be HTML.
|
|
|
1639 |
|
|
|
1640 |
Recoll filters are usually shell-scripts, but this is in no way necessary.
|
|
|
1641 |
These programs are extremely simple and most of the difficulty lies in
|
|
|
1642 |
extracting the text from the native format, not outputting what is
|
|
|
1643 |
expected by Recoll. Happily enough, most document formats already have
|
|
|
1644 |
translators or text extractors which handle the difficult part and can be
|
|
|
1645 |
called from the filter.
|
|
|
1646 |
|
|
|
1647 |
Filters are called with a single argument which is the source file name.
|
|
|
1648 |
They should output the result to stdout.
|
|
|
1649 |
|
|
|
1650 |
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
|
|
1651 |
the filter if the operation is for indexing or previewing. Some filters
|
|
|
1652 |
use this to output a slightly different format. This is not essential.
|
|
|
1653 |
|
1574 |
The html could be very minimal like the following example:
|
1654 |
The output HTML could be very minimal like the following example:
|
1575 |
|
1655 |
|
1576 |
<html><head>
|
1656 |
<html><head>
|
1577 |
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
1657 |
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
1578 |
</head>
|
1658 |
</head>
|
1579 |
<body>some text content</body></html>
|
1659 |
<body>some text content</body></html>
|
|
... |
|
... |
1588 |
accurate for good results.
|
1668 |
accurate for good results.
|
1589 |
|
1669 |
|
1590 |
Recoll will also make use of other header fields if they are present:
|
1670 |
Recoll will also make use of other header fields if they are present:
|
1591 |
title, description, keywords.
|
1671 |
title, description, keywords.
|
1592 |
|
1672 |
|
|
|
1673 |
As of Recoll release 1.9, filters also have the possibility to "invent"
|
|
|
1674 |
field names. This should be output as meta tags:
|
|
|
1675 |
|
|
|
1676 |
<meta name="somefield" content="Some textual data" />
|
|
|
1677 |
|
|
|
1678 |
In this case, a correspondance between field name and Xapian prefix should
|
|
|
1679 |
also be added to the mimeconf file. See the existing entries for
|
|
|
1680 |
inspiration. The field can then be used inside the query language to
|
|
|
1681 |
narrow searches.
|
|
|
1682 |
|
1593 |
The easiest way to write a new filter is probably to start from an
|
1683 |
The easiest way to write a new filter is probably to start from an
|
1594 |
existing one.
|
1684 |
existing one.
|
1595 |
|
1685 |
|
1596 |
----------------------------------------------------------------------
|
1686 |
----------------------------------------------------------------------
|