|
a/src/internfile/mh_execm.h |
|
b/src/internfile/mh_execm.h |
|
... |
|
... |
29 |
* This version uses persistent filters which can handle multiple requests
|
29 |
* This version uses persistent filters which can handle multiple requests
|
30 |
* without exiting (both multiple files and multiple documents per file),
|
30 |
* without exiting (both multiple files and multiple documents per file),
|
31 |
* with a simple question/response protocol.
|
31 |
* with a simple question/response protocol.
|
32 |
*
|
32 |
*
|
33 |
* The data is exchanged in TLV fashion, in a way that should be
|
33 |
* The data is exchanged in TLV fashion, in a way that should be
|
34 |
* usable in most script languages. The basic unit has one line with a
|
34 |
* usable in most script languages. The basic unit of data has one line
|
35 |
* data type and a count, followed by the data. A 'message' ends with
|
35 |
* with a data type and a count (both ASCII), followed by the data. A
|
36 |
* one empty line. A possible exchange:
|
36 |
* 'message' is made of one or several units or tags and ends with one empty
|
|
|
37 |
* line.
|
37 |
*
|
38 |
*
|
38 |
* From recollindex (the message begins before 'Filename'):
|
39 |
* Example from recollindex (the message begins before 'Filename' and has
|
|
|
40 |
* 'Filename' and 'Ipath' tags):
|
39 |
*
|
41 |
*
|
40 |
Filename: 24
|
42 |
Filename: 24
|
41 |
/my/home/mail/somefolderIpath: 2
|
43 |
/my/home/mail/somefolderIpath: 2
|
42 |
22
|
44 |
22
|
43 |
|
45 |
|
44 |
<Message ends here: because of the empty line after '22'
|
46 |
<Message ends here: because of the empty line after '22'
|
45 |
|
47 |
|
46 |
*
|
48 |
*
|
47 |
* Example answer:
|
49 |
* Example answer, with 'Mimetype' and 'Data' tags
|
48 |
*
|
50 |
*
|
49 |
Mimetype: 10
|
51 |
Mimetype: 10
|
50 |
text/plainData: 10
|
52 |
text/plainData: 10
|
51 |
0123456789
|
53 |
0123456789
|
52 |
|
54 |
|
53 |
<Message ends here because of empty line
|
55 |
<Message ends here because of empty line
|
54 |
|
56 |
|
55 |
*
|
57 |
*
|
56 |
* This format is both extensible and reasonably easy to parse.
|
58 |
* This format is both extensible and reasonably easy to parse.
|
57 |
* While it's more fitted for python or perl on the script side, it
|
59 |
* While it's more fitted for python or perl on the script side, it
|
58 |
* should even be sort of usable from the shell (ie: use dd to read
|
60 |
* should even be sort of usable from the shell (e.g.: use dd to read
|
59 |
* the counted data). Most alternatives would need data encoding in
|
61 |
* the counted data). Most alternatives would need data encoding in
|
60 |
* some cases.
|
62 |
* some cases.
|
61 |
*
|
63 |
*
|
62 |
* Higher level dialog:
|
64 |
* Higher level dialog:
|
63 |
* The c++ program is the master and sends request messages to the script. The
|
65 |
* The C++ program is the master and sends request messages to the script.
|
64 |
* requests have the following fields:
|
66 |
* Both sides of the communication should be prepared to receive and discard
|
|
|
67 |
* unknown tags.
|
|
|
68 |
* The messages normally have the following tags:
|
65 |
* - Filename: the file to process. This can be empty meaning that we
|
69 |
* - Filename: the file to process. This can be empty meaning that we
|
66 |
* are requesting the next document in the current file.
|
70 |
* are requesting the next document in the current file.
|
67 |
* - Ipath: this will be present only if we are requesting a specific
|
71 |
* - Ipath: this will be present only if we are requesting a specific
|
68 |
* subdocument inside a container file (typically for preview, at query
|
72 |
* subdocument inside a container file (typically for preview, at query
|
69 |
* time). Absent during indexing (ipaths are generated and sent back from
|
73 |
* time). Absent during indexing (ipaths are generated and sent back from
|
70 |
* the script
|
74 |
* the script)
|
71 |
* - Mimetype: this is the mime type for the (possibly container) file.
|
75 |
* - Mimetype: this is the mime type for the (possibly container) file.
|
72 |
* Can be useful to filters which handle multiple types, like rclaudio.
|
76 |
* Can be useful to filters which handle multiple types, like rclaudio.
|
73 |
*
|
77 |
*
|
74 |
* The script answers with messages having the following fields:
|
78 |
* The script answers with messages having the following fields:
|
75 |
* - Document: translated document data (typically, but not always, html)
|
79 |
* - Document: translated document data.
|
76 |
* - Ipath: ipath for the returned document. Can be used at query time to
|
80 |
* - Ipath: ipath for the returned document. Can be used at query time to
|
77 |
* extract a specific subdocument for preview. Not present or empty for
|
81 |
* extract a specific subdocument for preview. Not present or empty for
|
78 |
* non-container files.
|
82 |
* non-container files and for the "self" document of a container.
|
79 |
* - Mimetype: mime type for the returned data (ie: text/html, text/plain)
|
83 |
* - Mimetype: mime type for the returned data.
|
|
|
84 |
* This is optional. For multi-document filters, if mimetype is
|
|
|
85 |
* not present in the answer, the ipath must be a file-name-like
|
|
|
86 |
* string which will be used to divine the mime type (this is used
|
|
|
87 |
* typically with archives like Zip or Tar). If this fails,
|
|
|
88 |
* the document will be handled as unknown type and the contents won't
|
|
|
89 |
* be indexed. When neither ipath nor mimetype are present the default
|
|
|
90 |
* is to attempt to treat the document as HTML.
|
|
|
91 |
* - Charset: for document types for which it makes sense, and if the filter
|
|
|
92 |
* has the information.
|
80 |
* - Eofnow: empty field: no document is returned and we're at eof.
|
93 |
* - Eofnow: empty field: no document is returned and we're at eof.
|
81 |
* - Eofnext: empty field: file ends after the doc returned by this message.
|
94 |
* - Eofnext: empty field: file ends after the doc returned by this message.
|
82 |
* - SubdocError: no subdoc returned by this request, but file goes on.
|
95 |
* - SubdocError: no subdoc returned by this request, but file goes on.
|
83 |
* (the indexer (1.14) treats this as a file-fatal error anyway).
|
|
|
84 |
* - FileError: error, stop for this file.
|
96 |
* - FileError: error, stop for this file.
|
85 |
*/
|
97 |
*/
|
86 |
class MimeHandlerExecMultiple : public MimeHandlerExec {
|
98 |
class MimeHandlerExecMultiple : public MimeHandlerExec {
|
87 |
/////////
|
99 |
/////////
|
88 |
// Things not reset by "clear()", additionally to those in MimeHandlerExec
|
100 |
// Things not reset by "clear()", additionally to those in MimeHandlerExec
|