Switch to unified view

a/src/doc/user/usermanual.xml b/src/doc/user/usermanual.xml
...
...
5194
        <sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
5194
        <sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
5195
          <title>The rclextract module</title>
5195
          <title>The rclextract module</title>
5196
5196
5197
          <para>Index queries do not provide document content (only a
5197
          <para>Index queries do not provide document content (only a
5198
          partial and unprecise reconstruction is performed to show the
5198
          partial and unprecise reconstruction is performed to show the
5199
          snippets text). In order to access the actual document data, 
5199
          snippets text). In order to access the actual document data, the
5200
          the data extraction part of the indexing process
5200
          data extraction part of the indexing process must be performed
5201
          must be performed (subdocument access and format
5201
          (subdocument access and format translation). This is not trivial
5202
          translation). This is not trivial in
5202
          in the case of embedded documents. The
5203
          general. The <literal>rclextract</literal> module currently
5203
          <literal>rclextract</literal> module provides a single class
5204
          provides a single class which can be used to access the data
5204
          which can be used to access the data content for result
5205
          content for result documents.</para>
5205
          documents.</para>
5206
5206
5207
          <sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES">
5207
          <sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES">
5208
            <title>Classes</title>
5208
            <title>Classes</title>
5209
            
5209
            
5210
            <sect5 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
5210
            <sect5 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
...
...
5218
                  built from a <literal>Doc</literal> object, output
5218
                  built from a <literal>Doc</literal> object, output
5219
                  from a query.</para></listitem>
5219
                  from a query.</para></listitem>
5220
                </varlistentry>
5220
                </varlistentry>
5221
                <varlistentry>
5221
                <varlistentry>
5222
                  <term>Extractor.textextract(ipath)</term>
5222
                  <term>Extractor.textextract(ipath)</term>
5223
                  <listitem><para>Extract document defined
5223
                  <listitem><para>Extract document defined by
5224
                  by <replaceable>ipath</replaceable> and return
5224
                  <replaceable>ipath</replaceable> and return a
5225
                  a <literal>Doc</literal> object. The doc.text field
5225
                  <literal>Doc</literal> object. The
5226
                  has the document text converted to either text/plain or
5226
                  <literal>doc.text</literal> field has the document text
5227
                  text/html according to doc.mimetype. The typical use
5227
                  converted to either text/plain or text/html according to
5228
                  <literal>doc.mimetype</literal>. The typical use would be
5228
                  would be as follows:
5229
                  as follows:</para>
5229
                  <programlisting>
5230
<programlisting>
5230
                    qdoc = query.fetchone()
5231
qdoc = query.fetchone()
5231
                    extractor = recoll.Extractor(qdoc)
5232
extractor = recoll.Extractor(qdoc)
5232
                    doc = extractor.textextract(qdoc.ipath)
5233
doc = extractor.textextract(qdoc.ipath)
5233
                    # use doc.text, e.g. for previewing
5234
# use doc.text, e.g. for previewing</programlisting>
5234
                  </programlisting>
5235
                  <para>Passing <literal>qdoc.ipath</literal> to
5236
                  <literal>textextract()</literal> is redundant, but
5237
                  reflects the fact that the <literal>Extractor</literal>
5238
                  object actually has the capability to access the other
5239
                  entries in a compound document.</para>
5235
                  </para></listitem>
5240
                  </listitem>
5236
                </varlistentry>
5241
                </varlistentry>
5237
                <varlistentry>
5242
                <varlistentry>
5238
                  <term>Extractor.idoctofile(ipath, targetmtype, outfile='')</term>
5243
                  <term>Extractor.idoctofile(ipath, targetmtype, outfile='')</term>
5239
                  <listitem><para>Extracts document into an output file,
5244
                  <listitem><para>Extracts document into an output file,
5240
                  which can be given explicitly or will be created as a
5245
                  which can be given explicitly or will be created as a
5241
                  temporary file to be deleted by the caller. Typical use:
5246
                  temporary file to be deleted by the caller. Typical
5242
                  <programlisting>
5247
                  use:</para> 
5243
                    qdoc = query.fetchone()
5248
<programlisting>
5249
qdoc = query.fetchone()
5244
                    extractor = recoll.Extractor(qdoc)
5250
extractor = recoll.Extractor(qdoc)
5245
                  filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</programlisting>
5251
filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</programlisting>
5246
5252
 
5253
                  <para>In all cases the output is a copy, even if the
5254
                  requested document is a regular system file, which may be
5255
                  wasteful in some cases. If you want to avoid this, you
5256
                  can test for a simple file document as follows:
5257
<programlisting>
5258
not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
5259
</programlisting>
5247
                  </para></listitem>
5260
                  </para></listitem>
5248
                </varlistentry>
5261
                </varlistentry>
5249
5262
5250
              </variablelist>
5263
              </variablelist>
5251
5264
5252
            </sect5> <!-- Extractor class -->
5265
            </sect5> <!-- Extractor class -->
5253
          </sect4> <!-- rclextract classes -->
5266
          </sect4> <!-- rclextract classes -->
5254
        </sect3> <!-- rclextract module -->
5267
        </sect3> <!-- rclextract module -->
5268
5255
5269
5256
        <sect3 id="RCL.PROGRAM.PYTHONAPI.SEARCH.EXAMPLE">
5270
        <sect3 id="RCL.PROGRAM.PYTHONAPI.SEARCH.EXAMPLE">
5257
          <title>Search API usage example</title>
5271
          <title>Search API usage example</title>
5258
5272
5259
          <para>The following sample would query the index with a user
5273
          <para>The following sample would query the index with a user
...
...
5261
          directory inside the &RCL; source for other
5275
          directory inside the &RCL; source for other
5262
          examples. The <filename>recollgui</filename> subdirectory
5276
          examples. The <filename>recollgui</filename> subdirectory
5263
          has a very embryonic GUI which demonstrates the
5277
          has a very embryonic GUI which demonstrates the
5264
          highlighting and data extraction functions.</para>
5278
          highlighting and data extraction functions.</para>
5265
5279
5266
          <programlisting>
5280
<programlisting><![CDATA[
5267
            #!/usr/bin/env python
5281
#!/usr/bin/env python
5268
            <![CDATA[
5282
5269
                     from recoll import recoll
5283
from recoll import recoll
5270
5284
5271
db = recoll.connect()
5285
db = recoll.connect()
5272
db.setAbstractParams(maxchars=80, contextwords=4)
5286
db.setAbstractParams(maxchars=80, contextwords=4)
5273
5287
5274
query = db.query()
5288
query = db.query()
5275
nres = query.execute("some user question")
5289
nres = query.execute("some user question")
5276
print "Result count: ", nres
5290
print "Result count: ", nres
5277
if nres > 5:
5291
if nres > 5:
5278
nres = 5
5292
    nres = 5
5279
for i in range(nres):
5293
for i in range(nres):
5280
doc = query.fetchone()
5294
    doc = query.fetchone()
5281
print "Result #%d" % (query.rownumber,)
5295
    print "Result #%d" % (query.rownumber,)
5282
for k in ("title", "size"):
5296
    for k in ("title", "size"):
5283
print k, ":", getattr(doc, k).encode('utf-8')
5297
        print k, ":", getattr(doc, k).encode('utf-8')
5284
abs = db.makeDocAbstract(doc, query).encode('utf-8')
5298
    abs = db.makeDocAbstract(doc, query).encode('utf-8')
5285
print abs
5299
    print abs
5286
print
5300
    print
5287
5301
]]></programlisting>
5288
            ]]>
5289
          </programlisting>
5290
5302
5291
        </sect3>
5303
        </sect3>
5292
      </sect2>
5304
      </sect2>
5293
5305
5294
5306