recoll / Code / Diff of /src/doc/user/usermanual.xml

Diff of /src/doc/user/usermanual.xml [216c69] .. [fe2eb1]

Switch to side-by-side view

--- a/src/doc/user/usermanual.xml
+++ b/src/doc/user/usermanual.xml
@@ -5196,13 +5196,13 @@
 
           <para>Index queries do not provide document content (only a
           partial and unprecise reconstruction is performed to show the
-          snippets text). In order to access the actual document data, 
-          the data extraction part of the indexing process
-          must be performed (subdocument access and format
-          translation). This is not trivial in
-          general. The <literal>rclextract</literal> module currently
-          provides a single class which can be used to access the data
-          content for result documents.</para>
+          snippets text). In order to access the actual document data, the
+          data extraction part of the indexing process must be performed
+          (subdocument access and format translation). This is not trivial
+          in the case of embedded documents. The
+          <literal>rclextract</literal> module provides a single class
+          which can be used to access the data content for result
+          documents.</para>
 
           <sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES">
             <title>Classes</title>
@@ -5220,30 +5220,43 @@
                 </varlistentry>
                 <varlistentry>
                   <term>Extractor.textextract(ipath)</term>
-                  <listitem><para>Extract document defined
-                  by <replaceable>ipath</replaceable> and return
-                  a <literal>Doc</literal> object. The doc.text field
-                  has the document text converted to either text/plain or
-                  text/html according to doc.mimetype. The typical use
-                  would be as follows:
-                  <programlisting>
-                    qdoc = query.fetchone()
-                    extractor = recoll.Extractor(qdoc)
-                    doc = extractor.textextract(qdoc.ipath)
-                    # use doc.text, e.g. for previewing
-                  </programlisting>
-                  </para></listitem>
+                  <listitem><para>Extract document defined by
+                  <replaceable>ipath</replaceable> and return a
+                  <literal>Doc</literal> object. The
+                  <literal>doc.text</literal> field has the document text
+                  converted to either text/plain or text/html according to
+                  <literal>doc.mimetype</literal>. The typical use would be
+                  as follows:</para>
+<programlisting>
+qdoc = query.fetchone()
+extractor = recoll.Extractor(qdoc)
+doc = extractor.textextract(qdoc.ipath)
+# use doc.text, e.g. for previewing</programlisting>
+                  <para>Passing <literal>qdoc.ipath</literal> to
+                  <literal>textextract()</literal> is redundant, but
+                  reflects the fact that the <literal>Extractor</literal>
+                  object actually has the capability to access the other
+                  entries in a compound document.</para>
+                  </listitem>
                 </varlistentry>
                 <varlistentry>
                   <term>Extractor.idoctofile(ipath, targetmtype, outfile='')</term>
                   <listitem><para>Extracts document into an output file,
                   which can be given explicitly or will be created as a
-                  temporary file to be deleted by the caller. Typical use:
-                  <programlisting>
-                    qdoc = query.fetchone()
-                    extractor = recoll.Extractor(qdoc)
-                  filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</programlisting>
-
+                  temporary file to be deleted by the caller. Typical
+                  use:</para> 
+<programlisting>
+qdoc = query.fetchone()
+extractor = recoll.Extractor(qdoc)
+filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</programlisting>
+ 
+                  <para>In all cases the output is a copy, even if the
+                  requested document is a regular system file, which may be
+                  wasteful in some cases. If you want to avoid this, you
+                  can test for a simple file document as follows:
+<programlisting>
+not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
+</programlisting>
                   </para></listitem>
                 </varlistentry>
 
@@ -5252,6 +5265,7 @@
             </sect5> <!-- Extractor class -->
           </sect4> <!-- rclextract classes -->
         </sect3> <!-- rclextract module -->
+
 
         <sect3 id="RCL.PROGRAM.PYTHONAPI.SEARCH.EXAMPLE">
           <title>Search API usage example</title>
@@ -5263,10 +5277,10 @@
           has a very embryonic GUI which demonstrates the
           highlighting and data extraction functions.</para>
 
-          <programlisting>
-            #!/usr/bin/env python
-            <![CDATA[
-                     from recoll import recoll
+<programlisting><![CDATA[
+#!/usr/bin/env python
+
+from recoll import recoll
 
 db = recoll.connect()
 db.setAbstractParams(maxchars=80, contextwords=4)
@@ -5275,18 +5289,16 @@
 nres = query.execute("some user question")
 print "Result count: ", nres
 if nres > 5:
-nres = 5
+    nres = 5
 for i in range(nres):
-doc = query.fetchone()
-print "Result #%d" % (query.rownumber,)
-for k in ("title", "size"):
-print k, ":", getattr(doc, k).encode('utf-8')
-abs = db.makeDocAbstract(doc, query).encode('utf-8')
-print abs
-print
-
-            ]]>
-          </programlisting>
+    doc = query.fetchone()
+    print "Result #%d" % (query.rownumber,)
+    for k in ("title", "size"):
+        print k, ":", getattr(doc, k).encode('utf-8')
+    abs = db.makeDocAbstract(doc, query).encode('utf-8')
+    print abs
+    print
+]]></programlisting>
 
         </sect3>
       </sect2>