Switch to unified view

a/src/internfile/mh_execm.h b/src/internfile/mh_execm.h
...
...
29
 * This version uses persistent filters which can handle multiple requests 
29
 * This version uses persistent filters which can handle multiple requests 
30
 * without exiting (both multiple files and multiple documents per file), 
30
 * without exiting (both multiple files and multiple documents per file), 
31
 * with a simple question/response protocol.
31
 * with a simple question/response protocol.
32
 *
32
 *
33
 * The data is exchanged in TLV fashion, in a way that should be
33
 * The data is exchanged in TLV fashion, in a way that should be
34
 * usable in most script languages. The basic unit has one line with a
34
 * usable in most script languages. The basic unit of data has one line 
35
 * data type and a count, followed by the data. A 'message' ends with
35
 * with a data type and a count (both ASCII), followed by the data. A
36
 * one empty line. A possible exchange:
36
 * 'message' is made of one or several units or tags and ends with one empty
37
 * line. 
37
 * 
38
 * 
38
 * From recollindex (the message begins before 'Filename'):
39
 * Example from recollindex (the message begins before 'Filename' and has
40
 * 'Filename' and 'Ipath' tags):
39
 * 
41
 * 
40
Filename: 24
42
Filename: 24
41
/my/home/mail/somefolderIpath: 2
43
/my/home/mail/somefolderIpath: 2
42
22
44
22
43
45
44
<Message ends here: because of the empty line after '22'
46
<Message ends here: because of the empty line after '22'
45
47
46
 * 
48
 * 
47
 * Example answer:
49
 * Example answer, with 'Mimetype' and 'Data' tags
48
 * 
50
 * 
49
Mimetype: 10
51
Mimetype: 10
50
text/plainData: 10
52
text/plainData: 10
51
0123456789
53
0123456789
52
54
53
<Message ends here because of empty line
55
<Message ends here because of empty line
54
56
55
 *        
57
 *        
56
 * This format is both extensible and reasonably easy to parse. 
58
 * This format is both extensible and reasonably easy to parse. 
57
 * While it's more fitted for python or perl on the script side, it
59
 * While it's more fitted for python or perl on the script side, it
58
 * should even be sort of usable from the shell (ie: use dd to read
60
 * should even be sort of usable from the shell (e.g.: use dd to read
59
 * the counted data). Most alternatives would need data encoding in
61
 * the counted data). Most alternatives would need data encoding in
60
 * some cases.
62
 * some cases.
61
 *
63
 *
62
 * Higher level dialog:
64
 * Higher level dialog:
63
 * The c++ program is the master and sends request messages to the script. The
65
 * The C++ program is the master and sends request messages to the script. 
64
 * requests have the following fields:
66
 * Both sides of the communication should be prepared to receive and discard 
67
 * unknown tags.
68
 * The messages normally have the following tags:
65
 *  - Filename: the file to process. This can be empty meaning that we 
69
 *  - Filename: the file to process. This can be empty meaning that we 
66
 *      are requesting the next document in the current file.
70
 *      are requesting the next document in the current file.
67
 *  - Ipath: this will be present only if we are requesting a specific 
71
 *  - Ipath: this will be present only if we are requesting a specific 
68
 *      subdocument inside a container file (typically for preview, at query 
72
 *      subdocument inside a container file (typically for preview, at query 
69
 *      time). Absent during indexing (ipaths are generated and sent back from
73
 *      time). Absent during indexing (ipaths are generated and sent back from
70
 *      the script
74
 *      the script)
71
 *  - Mimetype: this is the mime type for the (possibly container) file. 
75
 *  - Mimetype: this is the mime type for the (possibly container) file. 
72
 *      Can be useful to filters which handle multiple types, like rclaudio.
76
 *    Can be useful to filters which handle multiple types, like rclaudio.
73
 *      
77
 *      
74
 * The script answers with messages having the following fields:
78
 * The script answers with messages having the following fields:
75
 *   - Document: translated document data (typically, but not always, html)
79
 *   - Document: translated document data.
76
 *   - Ipath: ipath for the returned document. Can be used at query time to
80
 *   - Ipath: ipath for the returned document. Can be used at query time to
77
 *       extract a specific subdocument for preview. Not present or empty for 
81
 *     extract a specific subdocument for preview. Not present or empty for 
78
 *       non-container files.
82
 *     non-container files and for the "self" document of a container.
79
 *   - Mimetype: mime type for the returned data (ie: text/html, text/plain)
83
 *   - Mimetype: mime type for the returned data.
84
 *     This is optional. For multi-document filters, if mimetype is
85
 *     not present in the answer, the ipath must be a file-name-like
86
 *     string which will be used to divine the mime type (this is used
87
 *     typically with archives like Zip or Tar). If this fails,
88
 *     the document will be handled as unknown type and the contents won't 
89
 *     be indexed. When neither ipath nor mimetype are present the default 
90
 *     is to attempt to treat the document as HTML.
91
 *   - Charset: for document types for which it makes sense, and if the filter
92
 *     has the information.
80
 *   - Eofnow: empty field: no document is returned and we're at eof.
93
 *   - Eofnow: empty field: no document is returned and we're at eof.
81
 *   - Eofnext: empty field: file ends after the doc returned by this message.
94
 *   - Eofnext: empty field: file ends after the doc returned by this message.
82
 *   - SubdocError: no subdoc returned by this request, but file goes on.
95
 *   - SubdocError: no subdoc returned by this request, but file goes on.
83
 *      (the indexer (1.14) treats this as a file-fatal error anyway).
84
 *   - FileError: error, stop for this file.
96
 *   - FileError: error, stop for this file.
85
 */
97
 */
86
class MimeHandlerExecMultiple : public MimeHandlerExec {
98
class MimeHandlerExecMultiple : public MimeHandlerExec {
87
    /////////
99
    /////////
88
    // Things not reset by "clear()", additionally to those in MimeHandlerExec
100
    // Things not reset by "clear()", additionally to those in MimeHandlerExec