|
a/src/doc/user/usermanual.xml |
|
b/src/doc/user/usermanual.xml |
|
... |
|
... |
37 |
web site</ulink>.</literal></para>
|
37 |
web site</ulink>.</literal></para>
|
38 |
|
38 |
|
39 |
<para>This document introduces full text search notions
|
39 |
<para>This document introduces full text search notions
|
40 |
and describes the installation and use of the &RCL;
|
40 |
and describes the installation and use of the &RCL;
|
41 |
application. It currently describes &RCL; &RCLVERSION;.</para>
|
41 |
application. It currently describes &RCL; &RCLVERSION;.</para>
|
42 |
<!-- <para>[ <ulink url="index.html">Split HTML</ulink> /
|
|
|
43 |
<ulink url="usermanual-xml.html">Single HTML</ulink> ]</para>
|
|
|
44 |
>
|
|
|
45 |
</abstract>
|
42 |
</abstract>
|
46 |
|
43 |
|
47 |
|
44 |
|
48 |
</bookinfo>
|
45 |
</bookinfo>
|
49 |
|
46 |
|
|
... |
|
... |
139 |
punctuation and capitalization are lost).</para>
|
136 |
punctuation and capitalization are lost).</para>
|
140 |
|
137 |
|
141 |
<para>&RCL; stores all internal data in <application>Unicode
|
138 |
<para>&RCL; stores all internal data in <application>Unicode
|
142 |
UTF-8</application> format, and it can index files with
|
139 |
UTF-8</application> format, and it can index files with
|
143 |
different character sets, encodings, and languages into the same
|
140 |
different character sets, encodings, and languages into the same
|
144 |
index. It has input filters for many document types.</para>
|
141 |
index. It has can process many document types.</para>
|
145 |
|
142 |
|
146 |
<para>Stemming is the process by which &RCL; reduces words to
|
143 |
<para>Stemming is the process by which &RCL; reduces words to
|
147 |
their radicals so that searching does not depend, for example, on a
|
144 |
their radicals so that searching does not depend, for example, on a
|
148 |
word being singular or plural (floor, floors), or on a verb tense
|
145 |
word being singular or plural (floor, floors), or on a verb tense
|
149 |
(flooring, floored). Because the mechanisms used for stemming
|
146 |
(flooring, floored). Because the mechanisms used for stemming
|
|
... |
|
... |
379 |
be ignored.</para>
|
376 |
be ignored.</para>
|
380 |
<para>Excluding types can be done by adding wildcard name
|
377 |
<para>Excluding types can be done by adding wildcard name
|
381 |
patterns to the <literal>skippedNames</literal> list, which
|
378 |
patterns to the <literal>skippedNames</literal> list, which
|
382 |
can be done from the GUI Index configuration menu. It is
|
379 |
can be done from the GUI Index configuration menu. It is
|
383 |
also possible to exclude a mime type independantly of the
|
380 |
also possible to exclude a mime type independantly of the
|
384 |
file name by associating it with
|
381 |
file name by associating it with the
|
385 |
the <filename>rclnull</filename> filter. This can be done by
|
382 |
<filename>rclnull</filename> input handler. This can be done
|
386 |
editing the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
383 |
by editing the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
387 |
<filename>mimeconf</filename> configuration
|
384 |
<filename>mimeconf</filename> configuration
|
388 |
file</link>.</para>
|
385 |
file</link>.</para>
|
389 |
|
386 |
|
390 |
<para>In order to define a positive list, You need to edit the
|
387 |
<para>In order to define a positive list, You need to edit the
|
391 |
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF">main
|
388 |
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF">main
|
|
... |
|
... |
2461 |
stored by default, apart from the values above
|
2458 |
stored by default, apart from the values above
|
2462 |
(only <literal>author</literal>
|
2459 |
(only <literal>author</literal>
|
2463 |
and <literal>filename</literal>), so this feature will need
|
2460 |
and <literal>filename</literal>), so this feature will need
|
2464 |
some custom local configuration to be useful. An example
|
2461 |
some custom local configuration to be useful. An example
|
2465 |
candidate would be the <literal>recipient</literal> field
|
2462 |
candidate would be the <literal>recipient</literal> field
|
2466 |
which is generated by the message filters.</para>
|
2463 |
which is generated by the message input handlers.</para>
|
2467 |
|
2464 |
|
2468 |
<para>The default value for the paragraph format string is:
|
2465 |
<para>The default value for the paragraph format string is:
|
2469 |
<screen><![CDATA[
|
2466 |
<screen><![CDATA[
|
2470 |
<img src="%I" align="left">%R %S %L <b>%T</b><br>
|
2467 |
<img src="%I" align="left">%R %S %L <b>%T</b><br>
|
2471 |
%M %D <i>%U</i> %i<br>
|
2468 |
%M %D <i>%U</i> %i<br>
|
|
... |
|
... |
2959 |
slow search (or even an incorrect one if the expansion is
|
2956 |
slow search (or even an incorrect one if the expansion is
|
2960 |
truncated because of excessive size). Also see
|
2957 |
truncated because of excessive size). Also see
|
2961 |
<link linkend="RCL.SEARCH.WILDCARDS">
|
2958 |
<link linkend="RCL.SEARCH.WILDCARDS">
|
2962 |
More about wildcards</link>.</para>
|
2959 |
More about wildcards</link>.</para>
|
2963 |
|
2960 |
|
2964 |
<para>The document filters used while indexing have the
|
2961 |
<para>The document input handlers used while indexing have the
|
2965 |
possibility to create other fields with arbitrary names, and
|
2962 |
possibility to create other fields with arbitrary names, and
|
2966 |
aliases may be defined in the configuration, so that the exact
|
2963 |
aliases may be defined in the configuration, so that the exact
|
2967 |
field search possibilities may be different for you if someone
|
2964 |
field search possibilities may be different for you if someone
|
2968 |
took care of the customisation.</para>
|
2965 |
took care of the customisation.</para>
|
2969 |
|
2966 |
|
|
... |
|
... |
3291 |
<para>&RCL; has an Application Programming Interface, usable both
|
3288 |
<para>&RCL; has an Application Programming Interface, usable both
|
3292 |
for indexing and searching, currently accessible from the
|
3289 |
for indexing and searching, currently accessible from the
|
3293 |
<application>Python</application> language.</para>
|
3290 |
<application>Python</application> language.</para>
|
3294 |
|
3291 |
|
3295 |
<para>Another less radical way to extend the application is to
|
3292 |
<para>Another less radical way to extend the application is to
|
3296 |
write filters for new types of documents.</para>
|
3293 |
write input handlers for new types of documents.</para>
|
3297 |
|
3294 |
|
3298 |
<para>The processing of metadata attributes for documents
|
3295 |
<para>The processing of metadata attributes for documents
|
3299 |
(<literal>fields</literal>) is highly configurable.</para>
|
3296 |
(<literal>fields</literal>) is highly configurable.</para>
|
3300 |
|
3297 |
|
3301 |
|
3298 |
|
3302 |
|
3299 |
|
3303 |
<sect1 id="RCL.PROGRAM.FILTERS">
|
3300 |
<sect1 id="RCL.PROGRAM.FILTERS">
|
3304 |
<title>Writing a document filter</title>
|
3301 |
<title>Writing a document input handler</title>
|
|
|
3302 |
|
|
|
3303 |
<note><title>Terminology</title>The small programs or pieces
|
|
|
3304 |
of code which handle the processing of the different document
|
|
|
3305 |
types for &RCL; used to be called <literal>filters</literal>,
|
|
|
3306 |
which is still reflected in the name of the directory which
|
|
|
3307 |
holds them and many configuration variables. They were named
|
|
|
3308 |
this way because one of their primary functions is to filter
|
|
|
3309 |
out the formatting directives and keep the text
|
|
|
3310 |
content. However these modules may have other behaviours, and
|
|
|
3311 |
the term <literal>input handler</literal> is now progressively
|
|
|
3312 |
substituted in the documentation. <literal>filter</literal> is
|
|
|
3313 |
still used in many places though.</note>
|
3305 |
|
3314 |
|
3306 |
<para>&RCL; filters cooperate to translate from the multitude
|
3315 |
<para>&RCL; input handlers cooperate to translate from the multitude
|
3307 |
of input document formats, simple ones
|
3316 |
of input document formats, simple ones
|
3308 |
as <application>opendocument</application>,
|
3317 |
as <application>opendocument</application>,
|
3309 |
<application>acrobat</application>), or compound ones such
|
3318 |
<application>acrobat</application>), or compound ones such
|
3310 |
as <application>Zip</application>
|
3319 |
as <application>Zip</application>
|
3311 |
or <application>Email</application>, into the final &RCL;
|
3320 |
or <application>Email</application>, into the final &RCL;
|
3312 |
indexing input format, which may
|
3321 |
indexing input format, which is plain text.
|
3313 |
be <literal>text/plain</literal>
|
3322 |
Most input handlers are executable
|
3314 |
or <literal>text/html</literal>. Most filters are executable
|
|
|
3315 |
programs or scripts. A few filters are coded in C++ and live
|
3323 |
programs or scripts. A few handlers are coded in C++ and live
|
3316 |
inside <command>recollindex</command>. This latter kind will not
|
3324 |
inside <command>recollindex</command>. This latter kind will not
|
3317 |
be described here.</para>
|
3325 |
be described here.</para>
|
3318 |
|
3326 |
|
3319 |
<para>There are currently (1.18 and since 1.13) two kinds of
|
3327 |
<para>There are currently (1.18 and since 1.13) two kinds of
|
3320 |
external executable filters:
|
3328 |
external executable input handlers:
|
3321 |
<itemizedlist>
|
3329 |
<itemizedlist>
|
3322 |
<listitem><para>Simple filters (<literal>exec</literal>
|
3330 |
<listitem><para>Simple <literal>exec</literal> handlers
|
3323 |
filters) run once and
|
3331 |
run once and exit. They can be bare programs like
|
3324 |
exit. They can be bare programs
|
3332 |
<command>antiword</command>, or scripts using other
|
3325 |
like <application>antiword</application>, or scripts
|
3333 |
programs. They are very simple to write, because they just
|
3326 |
using other programs. They are very simple to write,
|
3334 |
need to print the converted document to the standard
|
3327 |
because they just need to print the converted document
|
3335 |
output. Their output can be plain text or HTML. HTML is
|
3328 |
to the standard output. Their output can
|
3336 |
usually preferred because it can store metadata fields and
|
3329 |
be <literal>text/plain</literal>
|
3337 |
it allows preserving some of the formatting for the GUI
|
3330 |
or <literal>text/html</literal>.</para>
|
3338 |
preview.</para>
|
3331 |
</listitem>
|
3339 |
</listitem>
|
3332 |
<listitem><para>Multiple filters (<literal>execm</literal>
|
3340 |
<listitem><para>Multiple <literal>execm</literal> handlers
|
3333 |
filters), run as long as
|
3341 |
can process multiple files (sparing the process startup
|
3334 |
their master process (<command>recollindex</command>) is
|
3342 |
time which can be very significant), or multiple documents
|
3335 |
active. They can process multiple files (sparing the
|
3343 |
per file (e.g.: for <application>zip</application> or
|
3336 |
process startup time which can be very significant),
|
3344 |
<application>chm</application> files). They communicate
|
3337 |
or multiple documents per file (e.g.: for zip or chm
|
3345 |
with the indexer through a simple protocol, but are
|
3338 |
files). They communicate with the indexer through a
|
3346 |
nevertheless a bit more complicated than the older
|
3339 |
simple protocol, but are nevertheless a bit more
|
3347 |
kind. Most of new handlers are written in
|
3340 |
complicated than the older kind. Most of new
|
|
|
3341 |
filters are written
|
|
|
3342 |
in <application>Python</application>, using a common
|
3348 |
<application>Python</application>, using a common module
|
3343 |
module to handle the protocol. There is an
|
3349 |
to handle the protocol. There is an exception,
|
3344 |
exception, <command>rclimg</command> which is written
|
3350 |
<command>rclimg</command> which is written in Perl. The
|
3345 |
in Perl. The subdocuments output by these filters can
|
3351 |
subdocuments output by these handlers can be directly
|
3346 |
be directly indexable (text or HTML), or they can be
|
3352 |
indexable (text or HTML), or they can be other simple or
|
3347 |
other simple or compound documents that will need to
|
3353 |
compound documents that will need to be processed by
|
3348 |
be processed by another filter.</para>
|
3354 |
another handler.</para>
|
3349 |
</listitem>
|
3355 |
</listitem>
|
3350 |
</itemizedlist>
|
3356 |
</itemizedlist>
|
3351 |
</para>
|
3357 |
</para>
|
3352 |
|
3358 |
|
3353 |
<para>In both cases, filters deal with regular file system
|
3359 |
<para>In both cases, handlers deal with regular file system
|
3354 |
files, and can process either a single document, or a
|
3360 |
files, and can process either a single document, or a
|
3355 |
linear list of documents in each file. &RCL; is responsible
|
3361 |
linear list of documents in each file. &RCL; is responsible
|
3356 |
for performing up to date checks, deal with more complex
|
3362 |
for performing up to date checks, deal with more complex
|
3357 |
embedding and other upper level issues.</para>
|
3363 |
embedding and other upper level issues.</para>
|
3358 |
|
3364 |
|
3359 |
<para>In the extreme case of a simple filter returning a
|
3365 |
<para>A simple handler returning a
|
3360 |
document in <literal>text/plain</literal> format, no
|
3366 |
document in <literal>text/plain</literal> format, can transfer
|
3361 |
metadata can be transferred from the filter to the
|
|
|
3362 |
indexer. Generic metadata, like document size or
|
3367 |
no metadata to the indexer. Generic metadata, like document
|
3363 |
modification date, will be gathered and stored by the
|
3368 |
size or modification date, will be gathered and stored by
|
3364 |
indexer.</para>
|
3369 |
the indexer.</para>
|
3365 |
|
3370 |
|
3366 |
<para>Filters that produce <literal>text/html</literal>
|
3371 |
<para>Handlers that produce <literal>text/html</literal>
|
3367 |
format can return an arbitrary amount of metadata inside HTML
|
3372 |
format can return an arbitrary amount of metadata inside HTML
|
3368 |
<literal>meta</literal> tags. These will be processed
|
3373 |
<literal>meta</literal> tags. These will be processed
|
3369 |
according to the directives found in
|
3374 |
according to the directives found in
|
3370 |
the <link linkend="RCL.PROGRAM.FIELDS">
|
3375 |
the <link linkend="RCL.PROGRAM.FIELDS">
|
3371 |
<filename>fields</filename> configuration
|
3376 |
<filename>fields</filename> configuration
|
3372 |
file</link>.</para>
|
3377 |
file</link>.</para>
|
3373 |
|
3378 |
|
3374 |
<para>The filters that can handle multiple documents per file
|
3379 |
<para>The handlers that can handle multiple documents per file
|
3375 |
return a single piece of data to identify each document inside
|
3380 |
return a single piece of data to identify each document inside
|
3376 |
the file. This piece of data, called
|
3381 |
the file. This piece of data, called
|
3377 |
an <literal>ipath element</literal> will be sent back by
|
3382 |
an <literal>ipath element</literal> will be sent back by
|
3378 |
&RCL; to extract the document at query time, for previewing,
|
3383 |
&RCL; to extract the document at query time, for previewing,
|
3379 |
or for creating a temporary file to be opened by a
|
3384 |
or for creating a temporary file to be opened by a
|
3380 |
viewer.</para>
|
3385 |
viewer.</para>
|
3381 |
|
3386 |
|
3382 |
<para>The following section describes the simple
|
3387 |
<para>The following section describes the simple
|
3383 |
filters, and the next one gives a few explanations about
|
3388 |
handlers, and the next one gives a few explanations about
|
3384 |
the <literal>execm</literal> ones. You could conceivably
|
3389 |
the <literal>execm</literal> ones. You could conceivably
|
3385 |
write a simple filter with only the elements in the
|
3390 |
write a simple handler with only the elements in the
|
3386 |
manual. This will not be the case for the other ones, for
|
3391 |
manual. This will not be the case for the other ones, for
|
3387 |
which you will have to look at the code.</para>
|
3392 |
which you will have to look at the code.</para>
|
3388 |
|
3393 |
|
3389 |
<sect2 id="RCL.PROGRAM.FILTERS.SIMPLE">
|
3394 |
<sect2 id="RCL.PROGRAM.FILTERS.SIMPLE">
|
3390 |
<title>Simple filters</title>
|
3395 |
<title>Simple input handlers</title>
|
3391 |
|
3396 |
|
3392 |
<para>&RCL; simple filters are usually shell-scripts, but this is in
|
3397 |
<para>&RCL; simple handlers are usually shell-scripts, but this is in
|
3393 |
no way necessary. Extracting the text from the native format is the
|
3398 |
no way necessary. Extracting the text from the native format is the
|
3394 |
difficult part. Outputting the format expected by &RCL; is
|
3399 |
difficult part. Outputting the format expected by &RCL; is
|
3395 |
trivial. Happily enough, most document formats have translators or
|
3400 |
trivial. Happily enough, most document formats have translators or
|
3396 |
text extractors which can be called from the filter. In some cases
|
3401 |
text extractors which can be called from the handler. In some cases
|
3397 |
the output of the translating program is completely appropriate,
|
3402 |
the output of the translating program is completely appropriate,
|
3398 |
and no intermediate shell-script is needed.</para>
|
3403 |
and no intermediate shell-script is needed.</para>
|
3399 |
|
3404 |
|
3400 |
<para>Filters are called with a single argument which is the
|
3405 |
<para>Input handlers are called with a single argument which is the
|
3401 |
source file name. They should output the result to stdout.</para>
|
3406 |
source file name. They should output the result to stdout.</para>
|
3402 |
|
3407 |
|
3403 |
<para>When writing a filter, you should decide if it will output
|
3408 |
<para>When writing a handler, you should decide if it will output
|
3404 |
plain text or HTML. Plain text is simpler, but you will not be able
|
3409 |
plain text or HTML. Plain text is simpler, but you will not be able
|
3405 |
to add metadata or vary the output character encoding (this will be
|
3410 |
to add metadata or vary the output character encoding (this will be
|
3406 |
defined in a configuration file). Additionally, some formatting may
|
3411 |
defined in a configuration file). Additionally, some formatting may
|
3407 |
be easier to preserve when previewing HTML. Actually the deciding factor
|
3412 |
be easier to preserve when previewing HTML. Actually the deciding factor
|
3408 |
is metadata: &RCL; has a way to <link linkend="RCL.PROGRAM.FILTERS.HTML">
|
3413 |
is metadata: &RCL; has a way to <link linkend="RCL.PROGRAM.FILTERS.HTML">
|
3409 |
extract metadata from the HTML header and use it for field
|
3414 |
extract metadata from the HTML header and use it for field
|
3410 |
searches.</link>.</para>
|
3415 |
searches.</link>.</para>
|
3411 |
|
3416 |
|
3412 |
<para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment
|
3417 |
<para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment
|
3413 |
variable (values <literal>yes</literal>, <literal>no</literal>)
|
3418 |
variable (values <literal>yes</literal>, <literal>no</literal>)
|
3414 |
tells the filter if the operation is for indexing or
|
3419 |
tells the handler if the operation is for indexing or
|
3415 |
previewing. Some filters use this to output a slightly different
|
3420 |
previewing. Some handlers use this to output a slightly different
|
3416 |
format, for example stripping uninteresting repeated keywords (ie:
|
3421 |
format, for example stripping uninteresting repeated keywords (ie:
|
3417 |
<literal>Subject:</literal> for email) when indexing. This is not
|
3422 |
<literal>Subject:</literal> for email) when indexing. This is not
|
3418 |
essential.</para>
|
3423 |
essential.</para>
|
3419 |
|
3424 |
|
3420 |
<para>You should look at one of the simple filters, for example
|
3425 |
<para>You should look at one of the simple handlers, for example
|
3421 |
<command>rclps</command> for a starting point.</para>
|
3426 |
<command>rclps</command> for a starting point.</para>
|
3422 |
|
3427 |
|
3423 |
<para>Don't forget to make your filter executable before
|
3428 |
<para>Don't forget to make your handler executable before
|
3424 |
testing !</para>
|
3429 |
testing !</para>
|
3425 |
|
3430 |
|
3426 |
</sect2>
|
3431 |
</sect2>
|
3427 |
|
3432 |
|
3428 |
<sect2 id="RCL.PROGRAM.FILTERS.MULTIPLE">
|
3433 |
<sect2 id="RCL.PROGRAM.FILTERS.MULTIPLE">
|
3429 |
<title>"Multiple" filters</title>
|
3434 |
<title>"Multiple" handlers</title>
|
3430 |
|
3435 |
|
3431 |
<para>If you can program and want to write
|
3436 |
<para>If you can program and want to write
|
3432 |
an <literal>execm</literal> filter, it should not be too
|
3437 |
an <literal>execm</literal> handler, it should not be too
|
3433 |
difficult to make sense of one of the existing modules. For
|
3438 |
difficult to make sense of one of the existing modules. For
|
3434 |
example, look at <command>rclzip</command> which uses Zip
|
3439 |
example, look at <command>rclzip</command> which uses Zip
|
3435 |
file paths as identifiers (<literal>ipath</literal>),
|
3440 |
file paths as identifiers (<literal>ipath</literal>),
|
3436 |
and <command>rclics</command>, which uses an integer
|
3441 |
and <command>rclics</command>, which uses an integer
|
3437 |
index. Also have a look at the comments inside
|
3442 |
index. Also have a look at the comments inside
|
3438 |
the <filename>internfile/mh_execm.h</filename> file and
|
3443 |
the <filename>internfile/mh_execm.h</filename> file and
|
3439 |
possibly at the corresponding module.</para>
|
3444 |
possibly at the corresponding module.</para>
|
3440 |
|
3445 |
|
3441 |
<para><literal>execm</literal> filters sometimes need to make
|
3446 |
<para><literal>execm</literal> handlers sometimes need to make
|
3442 |
a choice for the nature of the <literal>ipath</literal>
|
3447 |
a choice for the nature of the <literal>ipath</literal>
|
3443 |
elements that they use in communication with the
|
3448 |
elements that they use in communication with the
|
3444 |
indexer. Here are a few guidelines:
|
3449 |
indexer. Here are a few guidelines:
|
3445 |
<itemizedlist>
|
3450 |
<itemizedlist>
|
3446 |
<listitem><para>Use ASCII or UTF-8 (if the identifier is an
|
3451 |
<listitem><para>Use ASCII or UTF-8 (if the identifier is an
|
|
... |
|
... |
3451 |
debugging.</para></listitem>
|
3456 |
debugging.</para></listitem>
|
3452 |
<listitem><para>&RCL; uses a colon (<literal>:</literal>) as a
|
3457 |
<listitem><para>&RCL; uses a colon (<literal>:</literal>) as a
|
3453 |
separator to store a complex path internally (for
|
3458 |
separator to store a complex path internally (for
|
3454 |
deeper embedding). Colons inside
|
3459 |
deeper embedding). Colons inside
|
3455 |
the <literal>ipath</literal> elements output by a
|
3460 |
the <literal>ipath</literal> elements output by a
|
3456 |
filter will be escaped, but would be a bad choice as a
|
3461 |
handler will be escaped, but would be a bad choice as a
|
3457 |
filter-specific separator (mostly, again, for
|
3462 |
handler-specific separator (mostly, again, for
|
3458 |
debugging issues).</para></listitem>
|
3463 |
debugging issues).</para></listitem>
|
3459 |
</itemizedlist>
|
3464 |
</itemizedlist>
|
3460 |
In any case, the main goal is that it should
|
3465 |
In any case, the main goal is that it should
|
3461 |
be easy for the filter to extract the target document, given
|
3466 |
be easy for the handler to extract the target document, given
|
3462 |
the file name and the <literal>ipath</literal>
|
3467 |
the file name and the <literal>ipath</literal>
|
3463 |
element.</para>
|
3468 |
element.</para>
|
3464 |
|
3469 |
|
3465 |
<para><literal>execm</literal> filters will also produce
|
3470 |
<para><literal>execm</literal> handlers will also produce
|
3466 |
a document with a null <literal>ipath</literal>
|
3471 |
a document with a null <literal>ipath</literal>
|
3467 |
element. Depending on the type of document, this may have
|
3472 |
element. Depending on the type of document, this may have
|
3468 |
some associated data (e.g. the body of an email message), or
|
3473 |
some associated data (e.g. the body of an email message), or
|
3469 |
none (typical for an archive file). If it is empty, this
|
3474 |
none (typical for an archive file). If it is empty, this
|
3470 |
document will be useful anyway for some operations, as the
|
3475 |
document will be useful anyway for some operations, as the
|
3471 |
parent of the actual data documents.</para>
|
3476 |
parent of the actual data documents.</para>
|
3472 |
</sect2>
|
3477 |
</sect2>
|
3473 |
|
3478 |
|
3474 |
<sect2 id="RCL.PROGRAM.FILTERS.ASSOCIATION">
|
3479 |
<sect2 id="RCL.PROGRAM.FILTERS.ASSOCIATION">
|
3475 |
<title>Telling &RCL; about the filter</title>
|
3480 |
<title>Telling &RCL; about the handler</title>
|
3476 |
|
3481 |
|
3477 |
<para>There are two elements that link a file to the filter which
|
3482 |
<para>There are two elements that link a file to the handler which
|
3478 |
should process it: the association of file to mime type and the
|
3483 |
should process it: the association of file to mime type and the
|
3479 |
association of a mime type with a filter.</para>
|
3484 |
association of a mime type with a handler.</para>
|
3480 |
|
3485 |
|
3481 |
<para>The association of files to mime types is mostly based on
|
3486 |
<para>The association of files to mime types is mostly based on
|
3482 |
name suffixes. The types are defined inside the
|
3487 |
name suffixes. The types are defined inside the
|
3483 |
<link linkend="RCL.INSTALL.CONFIG.MIMEMAP">
|
3488 |
<link linkend="RCL.INSTALL.CONFIG.MIMEMAP">
|
3484 |
<filename>mimemap</filename> file</link>. Example:
|
3489 |
<filename>mimemap</filename> file</link>. Example:
|
|
... |
|
... |
3488 |
</programlisting>
|
3493 |
</programlisting>
|
3489 |
If no suffix association is found for the file name, &RCL; will try
|
3494 |
If no suffix association is found for the file name, &RCL; will try
|
3490 |
to execute the <command>file -i</command> command to determine a
|
3495 |
to execute the <command>file -i</command> command to determine a
|
3491 |
mime type.</para>
|
3496 |
mime type.</para>
|
3492 |
|
3497 |
|
3493 |
<para>The association of file types to filters is performed in
|
3498 |
<para>The association of file types to handlers is performed in
|
3494 |
the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
3499 |
the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
3495 |
<filename>mimeconf</filename> file</link>. A sample will probably be
|
3500 |
<filename>mimeconf</filename> file</link>. A sample will probably be
|
3496 |
of better help than a long explanation:</para>
|
3501 |
of better help than a long explanation:</para>
|
3497 |
<programlisting>
|
3502 |
<programlisting>
|
3498 |
|
3503 |
|
|
... |
|
... |
3530 |
<literal>iso-8859-1</literal> encoding is specified because it
|
3535 |
<literal>iso-8859-1</literal> encoding is specified because it
|
3531 |
is not the <literal>utf-8</literal> default, and not output by
|
3536 |
is not the <literal>utf-8</literal> default, and not output by
|
3532 |
<command>unrtf</command> in the HTML header section.</para>
|
3537 |
<command>unrtf</command> in the HTML header section.</para>
|
3533 |
</listitem>
|
3538 |
</listitem>
|
3534 |
<listitem><para><literal>application/x-chm</literal> is processed
|
3539 |
<listitem><para><literal>application/x-chm</literal> is processed
|
3535 |
by a persistant filter. This is determined by the
|
3540 |
by a persistant handler. This is determined by the
|
3536 |
<literal>execm</literal> keyword.</para>
|
3541 |
<literal>execm</literal> keyword.</para>
|
3537 |
</listitem>
|
3542 |
</listitem>
|
3538 |
</itemizedlist>
|
3543 |
</itemizedlist>
|
3539 |
</para>
|
3544 |
</para>
|
3540 |
|
3545 |
|
3541 |
</sect2>
|
3546 |
</sect2>
|
3542 |
|
3547 |
|
3543 |
<sect2 id="RCL.PROGRAM.FILTERS.HTML">
|
3548 |
<sect2 id="RCL.PROGRAM.FILTERS.HTML">
|
3544 |
<title>Filter HTML output</title>
|
3549 |
<title>Input handler HTML output</title>
|
3545 |
|
3550 |
|
3546 |
<para>The output HTML could be very minimal like the following
|
3551 |
<para>The output HTML could be very minimal like the following
|
3547 |
example:
|
3552 |
example:
|
3548 |
<programlisting>
|
3553 |
<programlisting>
|
3549 |
<html>
|
3554 |
<html>
|
|
... |
|
... |
3605 |
<programlisting>
|
3610 |
<programlisting>
|
3606 |
<meta name="date" content="2013-02-24 17:50:00">
|
3611 |
<meta name="date" content="2013-02-24 17:50:00">
|
3607 |
</programlisting>
|
3612 |
</programlisting>
|
3608 |
</para>
|
3613 |
</para>
|
3609 |
|
3614 |
|
3610 |
<para>Filters also have the possibility to "invent" field
|
3615 |
<para>Input handlers also have the possibility to "invent" field
|
3611 |
names. This should also be output as meta tags:</para>
|
3616 |
names. This should also be output as meta tags:</para>
|
3612 |
|
3617 |
|
3613 |
<programlisting>
|
3618 |
<programlisting>
|
3614 |
<meta name="somefield" content="Some textual data" />
|
3619 |
<meta name="somefield" content="Some textual data" />
|
3615 |
</programlisting>
|
3620 |
</programlisting>
|
|
... |
|
... |
3632 |
|
3637 |
|
3633 |
<sect2 id="RCL.PROGRAM.FILTERS.PAGES">
|
3638 |
<sect2 id="RCL.PROGRAM.FILTERS.PAGES">
|
3634 |
<title>Page numbers</title>
|
3639 |
<title>Page numbers</title>
|
3635 |
|
3640 |
|
3636 |
<para>The indexer will interpret <literal>^L</literal> characters
|
3641 |
<para>The indexer will interpret <literal>^L</literal> characters
|
3637 |
in the filter output as indicating page breaks, and will record
|
3642 |
in the handler output as indicating page breaks, and will record
|
3638 |
them. At query time, this allows starting a viewer on the right
|
3643 |
them. At query time, this allows starting a viewer on the right
|
3639 |
page for a hit or a snippet. Currently, only the PDF, Postscript
|
3644 |
page for a hit or a snippet. Currently, only the PDF, Postscript
|
3640 |
and DVI filters generate page breaks.</para>
|
3645 |
and DVI handlers generate page breaks.</para>
|
3641 |
|
3646 |
|
3642 |
</sect2>
|
3647 |
</sect2>
|
3643 |
|
3648 |
|
3644 |
</sect1>
|
3649 |
</sect1>
|
3645 |
|
3650 |
|
|
... |
|
... |
3649 |
<para><literal>Fields</literal> are named pieces of information
|
3654 |
<para><literal>Fields</literal> are named pieces of information
|
3650 |
in or about documents, like <literal>title</literal>,
|
3655 |
in or about documents, like <literal>title</literal>,
|
3651 |
<literal>author</literal>, <literal>abstract</literal>.</para>
|
3656 |
<literal>author</literal>, <literal>abstract</literal>.</para>
|
3652 |
|
3657 |
|
3653 |
<para>The field values for documents can appear in several ways
|
3658 |
<para>The field values for documents can appear in several ways
|
3654 |
during indexing: either output by filters
|
3659 |
during indexing: either output by input handlers
|
3655 |
as <literal>meta</literal> fields in the HTML header section, or
|
3660 |
as <literal>meta</literal> fields in the HTML header section, or
|
3656 |
extracted from file extended attributes, or added as attributes
|
3661 |
extracted from file extended attributes, or added as attributes
|
3657 |
of the <literal>Doc</literal> object when using the API, or
|
3662 |
of the <literal>Doc</literal> object when using the API, or
|
3658 |
again synthetized internally by &RCL;.</para>
|
3663 |
again synthetized internally by &RCL;.</para>
|
3659 |
|
3664 |
|
3660 |
<para>The &RCL; query language allows searching for text in a
|
3665 |
<para>The &RCL; query language allows searching for text in a
|
3661 |
specific field.</para>
|
3666 |
specific field.</para>
|
3662 |
|
3667 |
|
3663 |
<para>&RCL; defines a number of default fields. Additional
|
3668 |
<para>&RCL; defines a number of default fields. Additional
|
3664 |
ones can be output by filters, and described in the
|
3669 |
ones can be output by handlers, and described in the
|
3665 |
<filename>fields</filename> configuration file.</para>
|
3670 |
<filename>fields</filename> configuration file.</para>
|
3666 |
|
3671 |
|
3667 |
<para>Fields can be:</para>
|
3672 |
<para>Fields can be:</para>
|
3668 |
<itemizedlist>
|
3673 |
<itemizedlist>
|
3669 |
|
3674 |
|
|
... |
|
... |
3901 |
|
3906 |
|
3902 |
<sect5 id="RCL.PROGRAM.PYTHON.RECOLL.CLASSES.DB">
|
3907 |
<sect5 id="RCL.PROGRAM.PYTHON.RECOLL.CLASSES.DB">
|
3903 |
<title>The Db class</title>
|
3908 |
<title>The Db class</title>
|
3904 |
|
3909 |
|
3905 |
<para>A Db object is created by
|
3910 |
<para>A Db object is created by
|
3906 |
a <literal>connect()</literal> function and holds a
|
3911 |
a <literal>connect()</literal> call and holds a
|
3907 |
connection to a Recoll index.</para>
|
3912 |
connection to a Recoll index.</para>
|
3908 |
<variablelist>
|
3913 |
<variablelist>
|
3909 |
<title>Methods</title>
|
3914 |
<title>Methods</title>
|
3910 |
<varlistentry>
|
3915 |
<varlistentry>
|
3911 |
<term>Db.close()</term>
|
3916 |
<term>Db.close()</term>
|
|
... |
|
... |
4379 |
<guilabel>File</guilabel> menu. The list is stored in the
|
4384 |
<guilabel>File</guilabel> menu. The list is stored in the
|
4380 |
<filename>missing</filename> text file inside the configuration
|
4385 |
<filename>missing</filename> text file inside the configuration
|
4381 |
directory.</para>
|
4386 |
directory.</para>
|
4382 |
|
4387 |
|
4383 |
<para>A list of common file types which need external
|
4388 |
<para>A list of common file types which need external
|
4384 |
commands follows. Many of the filters need the
|
4389 |
commands follows. Many of the handlers need the
|
4385 |
<command>iconv</command> command, which is not always listed as a
|
4390 |
<command>iconv</command> command, which is not always listed as a
|
4386 |
dependancy.</para>
|
4391 |
dependancy.</para>
|
4387 |
|
4392 |
|
4388 |
<para>Please note that, due to the relatively dynamic nature of this
|
4393 |
<para>Please note that, due to the relatively dynamic nature of this
|
4389 |
information, the most up to date version is now kept on &RCLAPPS;
|
4394 |
information, the most up to date version is now kept on &RCLAPPS;
|
|
... |
|
... |
4396 |
are sometimes outdated, or not the best version for &RCL;, so you
|
4401 |
are sometimes outdated, or not the best version for &RCL;, so you
|
4397 |
should take a look at &RCLAPPS; if a file
|
4402 |
should take a look at &RCLAPPS; if a file
|
4398 |
type is important to you.</para>
|
4403 |
type is important to you.</para>
|
4399 |
|
4404 |
|
4400 |
<para>As of &RCL; release 1.14, a number of XML-based formats that
|
4405 |
<para>As of &RCL; release 1.14, a number of XML-based formats that
|
4401 |
were handled by ad hoc filter code now use the
|
4406 |
were handled by ad hoc handler code now use the
|
4402 |
<command>xsltproc</command> command, which usually comes with
|
4407 |
<command>xsltproc</command> command, which usually comes with
|
4403 |
<application>libxslt</application>. These are: abiword, fb2
|
4408 |
<application>libxslt</application>. These are: abiword, fb2
|
4404 |
(ebooks), kword, openoffice, svg.</para>
|
4409 |
(ebooks), kword, openoffice, svg.</para>
|
4405 |
|
4410 |
|
4406 |
<para>Now for the list:</para>
|
4411 |
<para>Now for the list:</para>
|
|
... |
|
... |
4423 |
<command>antiword</command>. It is also useful to have
|
4428 |
<command>antiword</command>. It is also useful to have
|
4424 |
<command>wvWare</command> installed as it may be
|
4429 |
<command>wvWare</command> installed as it may be
|
4425 |
be used as a fallback for some files which
|
4430 |
be used as a fallback for some files which
|
4426 |
<command>antiword</command> does not handle.</para></listitem>
|
4431 |
<command>antiword</command> does not handle.</para></listitem>
|
4427 |
|
4432 |
|
4428 |
<listitem><para>MS Excel and PowerPoint need <command>
|
4433 |
<listitem><para>MS Excel and PowerPoint are processed by
|
4429 |
catdoc</command>.</para></listitem>
|
4434 |
internal <command>Python</command> handlers.</para></listitem>
|
4430 |
|
4435 |
|
4431 |
<listitem><para>MS Open XML (docx) needs <command>
|
4436 |
<listitem><para>MS Open XML (docx) needs <command>
|
4432 |
xsltproc</command>.</para></listitem>
|
4437 |
xsltproc</command>.</para></listitem>
|
4433 |
|
4438 |
|
4434 |
<listitem><para>Wordperfect files need <command>wpd2html</command>
|
4439 |
<listitem><para>Wordperfect files need <command>wpd2html</command>
|
|
... |
|
... |
4449 |
|
4454 |
|
4450 |
<listitem><para>djvu files need <command>djvutxt</command> and
|
4455 |
<listitem><para>djvu files need <command>djvutxt</command> and
|
4451 |
<command>djvused</command> from the
|
4456 |
<command>djvused</command> from the
|
4452 |
<application>DjVuLibre</application> package.</para></listitem>
|
4457 |
<application>DjVuLibre</application> package.</para></listitem>
|
4453 |
|
4458 |
|
4454 |
<listitem><para>Audio files: &RCL; releases before 1.13
|
4459 |
<listitem><para>Audio files: &RCL; releases 1.14 and later use
|
4455 |
used the <command>id3info</command> command from the <application>
|
|
|
4456 |
id3lib</application> package to extract mp3 tag information,
|
|
|
4457 |
<command>metaflac</command> (standard flac tools) for flac files,
|
|
|
4458 |
and <command>ogginfo</command> (vorbis tools) for ogg
|
|
|
4459 |
files. Releases 1.14 and later use a single
|
|
|
4460 |
<application>Python</application> filter based
|
4460 |
a single <application>Python</application> handler based
|
4461 |
on <application>mutagen</application> for all audio file
|
4461 |
on <application>mutagen</application> for all audio file
|
4462 |
types.</para>
|
4462 |
types.</para>
|
4463 |
</listitem>
|
4463 |
</listitem>
|
4464 |
|
4464 |
|
4465 |
<listitem><para>Pictures: &RCL; uses the
|
4465 |
<listitem><para>Pictures: &RCL; uses the
|
4466 |
<application>Exiftool</application>
|
4466 |
<application>Exiftool</application>
|
4467 |
<application>Perl</application> package to extract tag
|
4467 |
<application>Perl</application> package to extract tag
|
|
... |
|
... |
4469 |
there may not be much interest in indexing the technical tags
|
4469 |
there may not be much interest in indexing the technical tags
|
4470 |
(image size, aperture, etc.). This is only of interest if you
|
4470 |
(image size, aperture, etc.). This is only of interest if you
|
4471 |
store personal tags or textual descriptions inside the image
|
4471 |
store personal tags or textual descriptions inside the image
|
4472 |
files.</para></listitem>
|
4472 |
files.</para></listitem>
|
4473 |
|
4473 |
|
4474 |
<listitem><para>chm: files in microsoft help format need Python and
|
4474 |
<listitem><para>chm: files in Microsoft help format need Python and
|
4475 |
the <application>pychm</application> module (which needs
|
4475 |
the <application>pychm</application> module (which needs
|
4476 |
<application>chmlib</application>).</para></listitem>
|
4476 |
<application>chmlib</application>).</para></listitem>
|
4477 |
|
4477 |
|
4478 |
<listitem><para>ICS: up to &RCL; 1.13, iCalendar files need
|
4478 |
<listitem><para>ICS: up to &RCL; 1.13, iCalendar files need
|
4479 |
<application>Python</application>
|
4479 |
<application>Python</application>
|
|
... |
|
... |
4496 |
</listitem>
|
4496 |
</listitem>
|
4497 |
|
4497 |
|
4498 |
<listitem><para>Konqueror webarchive format with Python (uses the
|
4498 |
<listitem><para>Konqueror webarchive format with Python (uses the
|
4499 |
Tarfile module).</para></listitem>
|
4499 |
Tarfile module).</para></listitem>
|
4500 |
|
4500 |
|
4501 |
<listitem><para>mimehtml web archive format (support based on the email
|
4501 |
<listitem><para>Mimehtml web archive format (support based on
|
4502 |
filter, which introduces some mild weirdness, but still
|
4502 |
the email handler, which introduces some mild weirdness, but
|
4503 |
usable).</para></listitem>
|
4503 |
still usable).</para></listitem>
|
4504 |
|
4504 |
|
4505 |
</itemizedlist>
|
4505 |
</itemizedlist>
|
4506 |
|
4506 |
|
4507 |
<para>Text, HTML, email folders, and Scribus files are
|
4507 |
<para>Text, HTML, email folders, and Scribus files are
|
4508 |
processed internally. <application>Lyx</application> is used to
|
4508 |
processed internally. <application>Lyx</application> is used to
|
4509 |
index Lyx files. Many filters need <command>iconv</command> and the
|
4509 |
index Lyx files. Many handlers need <command>iconv</command> and the
|
4510 |
standard <command>sed</command> and <command>awk</command>.
|
4510 |
standard <command>sed</command> and <command>awk</command>.
|
4511 |
</para>
|
4511 |
</para>
|
4512 |
|
4512 |
|
4513 |
</sect1>
|
4513 |
</sect1>
|
4514 |
|
4514 |
|
|
... |
|
... |
4992 |
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ZIPSKIPPEDNAMES">
|
4992 |
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ZIPSKIPPEDNAMES">
|
4993 |
<term><varname>zipSkippedNames</varname></term>
|
4993 |
<term><varname>zipSkippedNames</varname></term>
|
4994 |
<listitem><para>A space-separated list of patterns for
|
4994 |
<listitem><para>A space-separated list of patterns for
|
4995 |
names of files or directories that should be ignored
|
4995 |
names of files or directories that should be ignored
|
4996 |
inside zip archives. This is used directly by the zip
|
4996 |
inside zip archives. This is used directly by the zip
|
4997 |
filter, and has a function similar to skippedNames, but
|
4997 |
handler, and has a function similar to skippedNames, but
|
4998 |
works independantly. Can be redefined for filesystem
|
4998 |
works independantly. Can be redefined for filesystem
|
4999 |
subdirectories. For versions up to 1.19, you will need
|
4999 |
subdirectories. For versions up to 1.19, you will need
|
5000 |
to update the Zip filter and install a supplementary
|
5000 |
to update the Zip handler and install a supplementary
|
5001 |
Python module. The details are
|
5001 |
Python module. The details are
|
5002 |
described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on
|
5002 |
described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on
|
5003 |
the &RCL; wiki</ulink>.
|
5003 |
the &RCL; wiki</ulink>.
|
5004 |
</para></listitem>
|
5004 |
</para></listitem>
|
5005 |
</varlistentry>
|
5005 |
</varlistentry>
|
|
... |
|
... |
5550 |
indexer (default class 3, no data).</para>
|
5550 |
indexer (default class 3, no data).</para>
|
5551 |
</listitem>
|
5551 |
</listitem>
|
5552 |
</varlistentry>
|
5552 |
</varlistentry>
|
5553 |
|
5553 |
|
5554 |
<varlistentry><term><varname>filtermaxseconds</varname></term>
|
5554 |
<varlistentry><term><varname>filtermaxseconds</varname></term>
|
5555 |
<listitem><para>Maximum filter execution time, after which it
|
5555 |
<listitem><para>Maximum handler execution time, after which it
|
5556 |
is aborted. Some postscript programs just loop...</para>
|
5556 |
is aborted. Some postscript programs just loop...</para>
|
5557 |
</listitem>
|
5557 |
</listitem>
|
5558 |
</varlistentry>
|
5558 |
</varlistentry>
|
5559 |
<varlistentry><term><varname>filtersdir</varname></term>
|
5559 |
<varlistentry><term><varname>filtersdir</varname></term>
|
5560 |
<listitem><para>A directory to search for the external
|
5560 |
<listitem><para>A directory to search for the external
|
5561 |
filter scripts used to index some types of files. The
|
5561 |
input handler scripts used to index some types of files. The
|
5562 |
value should not be changed, except if you want to modify
|
5562 |
value should not be changed, except if you want to modify
|
5563 |
one of the default scripts. The value can be redefined for
|
5563 |
one of the default scripts. The value can be redefined for
|
5564 |
any sub-directory. </para>
|
5564 |
any sub-directory. </para>
|
5565 |
</listitem>
|
5565 |
</listitem>
|
5566 |
</varlistentry>
|
5566 |
</varlistentry>
|
|
... |
|
... |
5676 |
canonical names used inside the <literal>[prefixes]</literal>
|
5676 |
canonical names used inside the <literal>[prefixes]</literal>
|
5677 |
and <literal>[stored]</literal> sections</para>
|
5677 |
and <literal>[stored]</literal> sections</para>
|
5678 |
</listitem>
|
5678 |
</listitem>
|
5679 |
</varlistentry>
|
5679 |
</varlistentry>
|
5680 |
<varlistentry>
|
5680 |
<varlistentry>
|
5681 |
<term>filter-specific sections</term>
|
5681 |
<term>handler-specific sections</term>
|
5682 |
<listitem><para>Some filters may need specific
|
5682 |
<listitem><para>Some input handlers may need specific
|
5683 |
configuration for handling fields. Only the email message filter
|
5683 |
configuration for handling fields. Only the email message handler
|
5684 |
currently has such a section (named
|
5684 |
currently has such a section (named
|
5685 |
<literal>[mail]</literal>). It allows indexing arbitrary email
|
5685 |
<literal>[mail]</literal>). It allows indexing arbitrary email
|
5686 |
headers in addition to the ones indexed by default. Other such
|
5686 |
headers in addition to the ones indexed by default. Other such
|
5687 |
sections may appear in the future.</para>
|
5687 |
sections may appear in the future.</para>
|
5688 |
</listitem>
|
5688 |
</listitem>
|
|
... |
|
... |
5692 |
|
5692 |
|
5693 |
<para>Here follows a small example of a personal
|
5693 |
<para>Here follows a small example of a personal
|
5694 |
<filename>fields</filename>
|
5694 |
<filename>fields</filename>
|
5695 |
file. This would extract a specific email header and
|
5695 |
file. This would extract a specific email header and
|
5696 |
use it as a searchable field, with data displayable inside result
|
5696 |
use it as a searchable field, with data displayable inside result
|
5697 |
lists. (Side note: as the email filter does no decoding on the values,
|
5697 |
lists. (Side note: as the email handler does no decoding on the values,
|
5698 |
only plain ascii headers can be indexed, and only the
|
5698 |
only plain ascii headers can be indexed, and only the
|
5699 |
first occurrence will be used for headers that occur several times).
|
5699 |
first occurrence will be used for headers that occur several times).
|
5700 |
|
5700 |
|
5701 |
<programlisting>[prefixes]
|
5701 |
<programlisting>[prefixes]
|
5702 |
# Index mailmytag contents (with the given prefix)
|
5702 |
# Index mailmytag contents (with the given prefix)
|
|
... |
|
... |
6005 |
(you can also create a category). Categories may be used
|
6005 |
(you can also create a category). Categories may be used
|
6006 |
for filtering in advanced search.</para>
|
6006 |
for filtering in advanced search.</para>
|
6007 |
</listitem>
|
6007 |
</listitem>
|
6008 |
</itemizedlist>
|
6008 |
</itemizedlist>
|
6009 |
|
6009 |
|
6010 |
<para>The <replaceable>rclblob</replaceable> filter should
|
6010 |
<para>The <replaceable>rclblob</replaceable> handler should
|
6011 |
be an executable program or script which exists inside
|
6011 |
be an executable program or script which exists inside
|
6012 |
<filename>/usr/[local/]share/recoll/filters</filename>. It
|
6012 |
<filename>/usr/[local/]share/recoll/filters</filename>. It
|
6013 |
will be given a file name as argument and should output the
|
6013 |
will be given a file name as argument and should output the
|
6014 |
text or html contents on the standard output.</para>
|
6014 |
text or html contents on the standard output.</para>
|
6015 |
|
6015 |
|
6016 |
<para>The <link linkend="RCL.PROGRAM.FILTERS">filter
|
6016 |
<para>The <link linkend="RCL.PROGRAM.FILTERS">filter
|
6017 |
programming</link> section describes in more detail how
|
6017 |
programming</link> section describes in more detail how
|
6018 |
to write a filter.</para>
|
6018 |
to write an input handler.</para>
|
6019 |
|
6019 |
|
6020 |
</sect3>
|
6020 |
</sect3>
|
6021 |
|
6021 |
|
6022 |
</sect2>
|
6022 |
</sect2>
|
6023 |
|
6023 |
|