Switch to unified view

a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html
...
...
6665
            <p>Index queries do not provide document content (only
6665
            <p>Index queries do not provide document content (only
6666
            a partial and unprecise reconstruction is performed to
6666
            a partial and unprecise reconstruction is performed to
6667
            show the snippets text). In order to access the actual
6667
            show the snippets text). In order to access the actual
6668
            document data, the data extraction part of the indexing
6668
            document data, the data extraction part of the indexing
6669
            process must be performed (subdocument access and
6669
            process must be performed (subdocument access and
6670
            format translation). This is not trivial in general.
6670
            format translation). This is not trivial in the case of
6671
            The <code class="literal">rclextract</code> module
6671
            embedded documents. The <code class=
6672
            currently provides a single class which can be used to
6672
            "literal">rclextract</code> module provides a single
6673
            access the data content for result documents.</p>
6673
            class which can be used to access the data content for
6674
            result documents.</p>
6674
            <div class="sect4">
6675
            <div class="sect4">
6675
              <div class="titlepage">
6676
              <div class="titlepage">
6676
                <div>
6677
                <div>
6677
                  <div>
6678
                  <div>
6678
                    <h5 class="title"><a name=
6679
                    <h5 class="title"><a name=
...
...
6707
                    "term">Extractor.textextract(ipath)</span></dt>
6708
                    "term">Extractor.textextract(ipath)</span></dt>
6708
                    <dd>
6709
                    <dd>
6709
                      <p>Extract document defined by <em class=
6710
                      <p>Extract document defined by <em class=
6710
                      "replaceable"><code>ipath</code></em> and
6711
                      "replaceable"><code>ipath</code></em> and
6711
                      return a <code class="literal">Doc</code>
6712
                      return a <code class="literal">Doc</code>
6712
                      object. The doc.text field has the document
6713
                      object. The <code class=
6714
                      "literal">doc.text</code> field has the
6713
                      text converted to either text/plain or
6715
                      document text converted to either text/plain
6714
                      text/html according to doc.mimetype. The
6716
                      or text/html according to <code class=
6717
                      "literal">doc.mimetype</code>. The typical
6715
                      typical use would be as follows:</p>
6718
                      use would be as follows:</p>
6716
                      <pre class="programlisting">
6719
                      <pre class="programlisting">
6717
                    qdoc = query.fetchone()
6720
qdoc = query.fetchone()
6718
                    extractor = recoll.Extractor(qdoc)
6721
extractor = recoll.Extractor(qdoc)
6719
                    doc = extractor.textextract(qdoc.ipath)
6722
doc = extractor.textextract(qdoc.ipath)
6720
                    # use doc.text, e.g. for previewing
6723
# use doc.text, e.g. for previewing</pre>
6721
                  </pre>
6724
                      <p>Passing <code class=
6725
                      "literal">qdoc.ipath</code> to <code class=
6726
                      "literal">textextract()</code> is redundant,
6727
                      but reflects the fact that the <code class=
6728
                      "literal">Extractor</code> object actually
6729
                      has the capability to access the other
6730
                      entries in a compound document.</p>
6722
                    </dd>
6731
                    </dd>
6723
                    <dt><span class=
6732
                    <dt><span class=
6724
                    "term">Extractor.idoctofile(ipath, targetmtype,
6733
                    "term">Extractor.idoctofile(ipath, targetmtype,
6725
                    outfile='')</span></dt>
6734
                    outfile='')</span></dt>
6726
                    <dd>
6735
                    <dd>
6727
                      <p>Extracts document into an output file,
6736
                      <p>Extracts document into an output file,
6728
                      which can be given explicitly or will be
6737
                      which can be given explicitly or will be
6729
                      created as a temporary file to be deleted by
6738
                      created as a temporary file to be deleted by
6730
                      the caller. Typical use:</p>
6739
                      the caller. Typical use:</p>
6731
                      <pre class="programlisting">
6740
                      <pre class="programlisting">
6732
                    qdoc = query.fetchone()
6741
qdoc = query.fetchone()
6733
                    extractor = recoll.Extractor(qdoc)
6742
extractor = recoll.Extractor(qdoc)
6734
                  filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</pre>
6743
filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</pre>
6744
                      <p>In all cases the output is a copy, even if
6745
                      the requested document is a regular system
6746
                      file, which may be wasteful in some cases. If
6747
                      you want to avoid this, you can test for a
6748
                      simple file document as follows:</p>
6749
                      <pre class="programlisting">
6750
not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
6751
</pre>
6735
                    </dd>
6752
                    </dd>
6736
                  </dl>
6753
                  </dl>
6737
                </div>
6754
                </div>
6738
              </div>
6755
              </div>
6739
            </div>
6756
            </div>
...
...
6756
            other examples. The <code class=
6773
            other examples. The <code class=
6757
            "filename">recollgui</code> subdirectory has a very
6774
            "filename">recollgui</code> subdirectory has a very
6758
            embryonic GUI which demonstrates the highlighting and
6775
            embryonic GUI which demonstrates the highlighting and
6759
            data extraction functions.</p>
6776
            data extraction functions.</p>
6760
            <pre class="programlisting">
6777
            <pre class="programlisting">
6761
            #!/usr/bin/env python
6778
#!/usr/bin/env python
6762
            
6779
6763
                     from recoll import recoll
6780
from recoll import recoll
6764
6781
6765
db = recoll.connect()
6782
db = recoll.connect()
6766
db.setAbstractParams(maxchars=80, contextwords=4)
6783
db.setAbstractParams(maxchars=80, contextwords=4)
6767
6784
6768
query = db.query()
6785
query = db.query()
6769
nres = query.execute("some user question")
6786
nres = query.execute("some user question")
6770
print "Result count: ", nres
6787
print "Result count: ", nres
6771
if nres &gt; 5:
6788
if nres &gt; 5:
6772
nres = 5
6789
    nres = 5
6773
for i in range(nres):
6790
for i in range(nres):
6774
doc = query.fetchone()
6791
    doc = query.fetchone()
6775
print "Result #%d" % (query.rownumber,)
6792
    print "Result #%d" % (query.rownumber,)
6776
for k in ("title", "size"):
6793
    for k in ("title", "size"):
6777
print k, ":", getattr(doc, k).encode('utf-8')
6794
        print k, ":", getattr(doc, k).encode('utf-8')
6778
abs = db.makeDocAbstract(doc, query).encode('utf-8')
6795
    abs = db.makeDocAbstract(doc, query).encode('utf-8')
6779
print abs
6796
    print abs
6780
print
6797
    print
6781
6798
</pre>
6782
            
6783
          </pre>
6784
          </div>
6799
          </div>
6785
        </div>
6800
        </div>
6786
        <div class="sect2">
6801
        <div class="sect2">
6787
          <div class="titlepage">
6802
          <div class="titlepage">
6788
            <div>
6803
            <div>