|
a/src/doc/user/usermanual.html |
|
b/src/doc/user/usermanual.html |
|
... |
|
... |
18 |
alink="#0000FF">
|
18 |
alink="#0000FF">
|
19 |
<div lang="en" class="book">
|
19 |
<div lang="en" class="book">
|
20 |
<div class="titlepage">
|
20 |
<div class="titlepage">
|
21 |
<div>
|
21 |
<div>
|
22 |
<div>
|
22 |
<div>
|
23 |
<h1 class="title"><a name="idp59627200" id=
|
23 |
<h1 class="title"><a name="idp57237872" id=
|
24 |
"idp59627200"></a>Recoll user manual</h1>
|
24 |
"idp57237872"></a>Recoll user manual</h1>
|
25 |
</div>
|
25 |
</div>
|
26 |
|
26 |
|
27 |
<div>
|
27 |
<div>
|
28 |
<div class="author">
|
28 |
<div class="author">
|
29 |
<h3 class="author"><span class=
|
29 |
<h3 class="author"><span class=
|
|
... |
|
... |
107 |
<dt><span class="sect2">2.1.2. <a href=
|
107 |
<dt><span class="sect2">2.1.2. <a href=
|
108 |
"#RCL.INDEXING.INTRODUCTION.CONFIG">Configurations,
|
108 |
"#RCL.INDEXING.INTRODUCTION.CONFIG">Configurations,
|
109 |
multiple indexes</a></span></dt>
|
109 |
multiple indexes</a></span></dt>
|
110 |
|
110 |
|
111 |
<dt><span class="sect2">2.1.3. <a href=
|
111 |
<dt><span class="sect2">2.1.3. <a href=
|
112 |
"#idp65068656">Document types</a></span></dt>
|
112 |
"#idp63233312">Document types</a></span></dt>
|
113 |
|
113 |
|
114 |
<dt><span class="sect2">2.1.4. <a href=
|
114 |
<dt><span class="sect2">2.1.4. <a href=
|
115 |
"#idp65088336">Indexing failures</a></span></dt>
|
115 |
"#idp63252992">Indexing failures</a></span></dt>
|
116 |
|
116 |
|
117 |
<dt><span class="sect2">2.1.5. <a href=
|
117 |
<dt><span class="sect2">2.1.5. <a href=
|
118 |
"#idp65095792">Recovery</a></span></dt>
|
118 |
"#idp63260448">Recovery</a></span></dt>
|
119 |
</dl>
|
119 |
</dl>
|
120 |
</dd>
|
120 |
</dd>
|
121 |
|
121 |
|
122 |
<dt><span class="sect1">2.2. <a href=
|
122 |
<dt><span class="sect1">2.2. <a href=
|
123 |
"#RCL.INDEXING.STORAGE">Index storage</a></span></dt>
|
123 |
"#RCL.INDEXING.STORAGE">Index storage</a></span></dt>
|
|
... |
|
... |
979 |
|
979 |
|
980 |
<div class="sect2">
|
980 |
<div class="sect2">
|
981 |
<div class="titlepage">
|
981 |
<div class="titlepage">
|
982 |
<div>
|
982 |
<div>
|
983 |
<div>
|
983 |
<div>
|
984 |
<h3 class="title"><a name="idp65068656" id=
|
984 |
<h3 class="title"><a name="idp63233312" id=
|
985 |
"idp65068656"></a>2.1.3. Document types</h3>
|
985 |
"idp63233312"></a>2.1.3. Document types</h3>
|
986 |
</div>
|
986 |
</div>
|
987 |
</div>
|
987 |
</div>
|
988 |
</div>
|
988 |
</div>
|
989 |
|
989 |
|
990 |
<p><span class="application">Recoll</span> knows about
|
990 |
<p><span class="application">Recoll</span> knows about
|
|
... |
|
... |
1073 |
|
1073 |
|
1074 |
<div class="sect2">
|
1074 |
<div class="sect2">
|
1075 |
<div class="titlepage">
|
1075 |
<div class="titlepage">
|
1076 |
<div>
|
1076 |
<div>
|
1077 |
<div>
|
1077 |
<div>
|
1078 |
<h3 class="title"><a name="idp65088336" id=
|
1078 |
<h3 class="title"><a name="idp63252992" id=
|
1079 |
"idp65088336"></a>2.1.4. Indexing
|
1079 |
"idp63252992"></a>2.1.4. Indexing
|
1080 |
failures</h3>
|
1080 |
failures</h3>
|
1081 |
</div>
|
1081 |
</div>
|
1082 |
</div>
|
1082 |
</div>
|
1083 |
</div>
|
1083 |
</div>
|
1084 |
|
1084 |
|
|
... |
|
... |
1114 |
|
1114 |
|
1115 |
<div class="sect2">
|
1115 |
<div class="sect2">
|
1116 |
<div class="titlepage">
|
1116 |
<div class="titlepage">
|
1117 |
<div>
|
1117 |
<div>
|
1118 |
<div>
|
1118 |
<div>
|
1119 |
<h3 class="title"><a name="idp65095792" id=
|
1119 |
<h3 class="title"><a name="idp63260448" id=
|
1120 |
"idp65095792"></a>2.1.5. Recovery</h3>
|
1120 |
"idp63260448"></a>2.1.5. Recovery</h3>
|
1121 |
</div>
|
1121 |
</div>
|
1122 |
</div>
|
1122 |
</div>
|
1123 |
</div>
|
1123 |
</div>
|
1124 |
|
1124 |
|
1125 |
<p>In the rare case where the index becomes corrupted
|
1125 |
<p>In the rare case where the index becomes corrupted
|
|
... |
|
... |
6377 |
</div>
|
6377 |
</div>
|
6378 |
</div>
|
6378 |
</div>
|
6379 |
|
6379 |
|
6380 |
<p><span class="application">Recoll</span> versions
|
6380 |
<p><span class="application">Recoll</span> versions
|
6381 |
after 1.11 define a Python programming interface, both
|
6381 |
after 1.11 define a Python programming interface, both
|
6382 |
for searching and indexing. The indexing portion has
|
6382 |
for searching and indexing.</p>
|
6383 |
seen little use, but the searching one is used in the
|
|
|
6384 |
Recoll Ubuntu Unity Lens and Recoll Web UI.</p>
|
|
|
6385 |
|
6383 |
|
|
|
6384 |
<p>The search interface is used in the Recoll Ubuntu
|
|
|
6385 |
Unity Lens and Recoll WebUI.</p>
|
|
|
6386 |
|
|
|
6387 |
<p>The indexing section of the API has seen little use,
|
|
|
6388 |
and is more a proof of concept. In truth it is waiting
|
|
|
6389 |
for its killer app...</p>
|
|
|
6390 |
|
6386 |
<p>The API is inspired by the Python database API
|
6391 |
<p>The search API is modeled along the Python database
|
6387 |
specification. There were two major changes in recent
|
6392 |
API specification. There were two major changes along
|
6388 |
<span class="application">Recoll</span> versions:</p>
|
6393 |
<span class="application">Recoll</span> versions:</p>
|
6389 |
|
6394 |
|
6390 |
<div class="itemizedlist">
|
6395 |
<div class="itemizedlist">
|
6391 |
<ul class="itemizedlist" style=
|
6396 |
<ul class="itemizedlist" style=
|
6392 |
"list-style-type: disc;">
|
6397 |
"list-style-type: disc;">
|
|
|
6398 |
<li class="listitem">
|
6393 |
<li class="listitem">The basis for the <span class=
|
6399 |
<p>The basis for the <span class=
|
6394 |
"application">Recoll</span> API changed from Python
|
6400 |
"application">Recoll</span> API changed from
|
6395 |
database API version 1.0 (<span class=
|
6401 |
Python database API version 1.0 (<span class=
|
6396 |
"application">Recoll</span> versions up to 1.18.1),
|
6402 |
"application">Recoll</span> versions up to
|
6397 |
to version 2.0 (<span class=
|
6403 |
1.18.1), to version 2.0 (<span class=
|
6398 |
"application">Recoll</span> 1.18.2 and later).</li>
|
6404 |
"application">Recoll</span> 1.18.2 and
|
|
|
6405 |
later).</p>
|
|
|
6406 |
</li>
|
6399 |
|
6407 |
|
6400 |
<li class="listitem">The <code class=
|
6408 |
<li class="listitem">
|
6401 |
"literal">recoll</code> module became a package
|
6409 |
<p>The <code class="literal">recoll</code> module
|
6402 |
(with an internal <code class=
|
6410 |
became a package (with an internal <code class=
|
6403 |
"literal">recoll</code> module) as of <span class=
|
6411 |
"literal">recoll</code> module) as of
|
6404 |
"application">Recoll</span> version 1.19, in order
|
6412 |
<span class="application">Recoll</span> version
|
6405 |
to add more functions. For existing code, this only
|
6413 |
1.19, in order to add more functions. For
|
6406 |
changes the way the interface must be
|
6414 |
existing code, this only changes the way the
|
|
|
6415 |
interface must be imported.</p>
|
6407 |
imported.</li>
|
6416 |
</li>
|
6408 |
</ul>
|
6417 |
</ul>
|
6409 |
</div>
|
6418 |
</div>
|
6410 |
|
6419 |
|
6411 |
<p>We will mostly describe the new API and package
|
6420 |
<p>We will mostly describe the new API and package
|
6412 |
structure here. A paragraph at the end of this section
|
6421 |
structure here. A paragraph at the end of this section
|
|
... |
|
... |
6431 |
<strong class=
|
6440 |
<strong class=
|
6432 |
"userinput"><code>python setup.py install</code></strong>
|
6441 |
"userinput"><code>python setup.py install</code></strong>
|
6433 |
|
6442 |
|
6434 |
</pre>
|
6443 |
</pre>
|
6435 |
|
6444 |
|
|
|
6445 |
<p>As of <span class="application">Recoll</span> 1.19,
|
|
|
6446 |
the module can be compiled for Python3.</p>
|
|
|
6447 |
|
6436 |
<p>The normal <span class="application">Recoll</span>
|
6448 |
<p>The normal <span class="application">Recoll</span>
|
6437 |
installer installs the Python API along with the main
|
6449 |
installer installs the Python2 API along with the main
|
|
|
6450 |
code. The Python3 version must be explicitely built and
|
6438 |
code.</p>
|
6451 |
installed.</p>
|
6439 |
|
6452 |
|
6440 |
<p>When installing from a repository, and depending on
|
6453 |
<p>When installing from a repository, and depending on
|
6441 |
the distribution, the Python API can sometimes be found
|
6454 |
the distribution, the Python API can sometimes be found
|
6442 |
in a separate package.</p>
|
6455 |
in a separate package.</p>
|
|
|
6456 |
|
|
|
6457 |
<p>The following small sample will run a query and list
|
|
|
6458 |
the title and url for each of the results. It would
|
|
|
6459 |
work with <span class="application">Recoll</span> 1.19
|
|
|
6460 |
and later. The <code class=
|
|
|
6461 |
"filename">python/samples</code> source directory
|
|
|
6462 |
contains several examples of Python programming with
|
|
|
6463 |
<span class="application">Recoll</span>, exercising the
|
|
|
6464 |
extension more completely, and especially its data
|
|
|
6465 |
extraction features.</p>
|
|
|
6466 |
<pre class="programlisting">
|
|
|
6467 |
from recoll import recoll
|
|
|
6468 |
|
|
|
6469 |
db = recoll.connect()
|
|
|
6470 |
query = db.query()
|
|
|
6471 |
nres = query.execute("some query")
|
|
|
6472 |
results = query.fetchmany(20)
|
|
|
6473 |
for doc in results:
|
|
|
6474 |
print(doc.url, doc.title)
|
|
|
6475 |
|
|
|
6476 |
</pre>
|
6443 |
</div>
|
6477 |
</div>
|
6444 |
|
6478 |
|
6445 |
<div class="sect3">
|
6479 |
<div class="sect3">
|
6446 |
<div class="titlepage">
|
6480 |
<div class="titlepage">
|
6447 |
<div>
|
6481 |
<div>
|
|
... |
|
... |
6562 |
<p>A Db object is created by a <code class=
|
6596 |
<p>A Db object is created by a <code class=
|
6563 |
"literal">connect()</code> call and holds a
|
6597 |
"literal">connect()</code> call and holds a
|
6564 |
connection to a Recoll index.</p>
|
6598 |
connection to a Recoll index.</p>
|
6565 |
|
6599 |
|
6566 |
<div class="variablelist">
|
6600 |
<div class="variablelist">
|
6567 |
<p class="title"><b>Methods</b></p>
|
|
|
6568 |
|
|
|
6569 |
<dl class="variablelist">
|
6601 |
<dl class="variablelist">
|
6570 |
<dt><span class="term">Db.close()</span></dt>
|
6602 |
<dt><span class="term">Db.close()</span></dt>
|
6571 |
|
6603 |
|
6572 |
<dd>Closes the connection. You can't do
|
6604 |
<dd>Closes the connection. You can't do
|
6573 |
anything with the <code class=
|
6605 |
anything with the <code class=
|
|
... |
|
... |
6626 |
created by a <code class=
|
6658 |
created by a <code class=
|
6627 |
"literal">Db.query()</code> call. It is used to
|
6659 |
"literal">Db.query()</code> call. It is used to
|
6628 |
execute index searches.</p>
|
6660 |
execute index searches.</p>
|
6629 |
|
6661 |
|
6630 |
<div class="variablelist">
|
6662 |
<div class="variablelist">
|
6631 |
<p class="title"><b>Methods</b></p>
|
|
|
6632 |
|
|
|
6633 |
<dl class="variablelist">
|
6663 |
<dl class="variablelist">
|
6634 |
<dt><span class="term">Query.sortby(fieldname,
|
6664 |
<dt><span class="term">Query.sortby(fieldname,
|
6635 |
ascending=True)</span></dt>
|
6665 |
ascending=True)</span></dt>
|
6636 |
|
6666 |
|
6637 |
<dd>Sort results by <em class=
|
6667 |
<dd>Sort results by <em class=
|
|
... |
|
... |
6803 |
the document text. See the <code class=
|
6833 |
the document text. See the <code class=
|
6804 |
"literal">rclextract</code> module for accessing
|
6834 |
"literal">rclextract</code> module for accessing
|
6805 |
document contents.</p>
|
6835 |
document contents.</p>
|
6806 |
|
6836 |
|
6807 |
<div class="variablelist">
|
6837 |
<div class="variablelist">
|
6808 |
<p class="title"><b>Methods</b></p>
|
|
|
6809 |
|
|
|
6810 |
<dl class="variablelist">
|
6838 |
<dl class="variablelist">
|
6811 |
<dt><span class="term">get(key), []
|
6839 |
<dt><span class="term">get(key), []
|
6812 |
operator</span></dt>
|
6840 |
operator</span></dt>
|
6813 |
|
6841 |
|
6814 |
<dd>Retrieve the named doc attribute</dd>
|
6842 |
<dd>Retrieve the named doc attribute</dd>
|
|
... |
|
... |
6852 |
in replacement of the query language approach. The
|
6880 |
in replacement of the query language approach. The
|
6853 |
interface is going to change a little, so no
|
6881 |
interface is going to change a little, so no
|
6854 |
detailed doc for now...</p>
|
6882 |
detailed doc for now...</p>
|
6855 |
|
6883 |
|
6856 |
<div class="variablelist">
|
6884 |
<div class="variablelist">
|
6857 |
<p class="title"><b>Methods</b></p>
|
|
|
6858 |
|
|
|
6859 |
<dl class="variablelist">
|
6885 |
<dl class="variablelist">
|
6860 |
<dt><span class=
|
6886 |
<dt><span class=
|
6861 |
"term">addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
6887 |
"term">addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
6862 |
qstring=string, slack=0, field='', stemming=1,
|
6888 |
qstring=string, slack=0, field='', stemming=1,
|
6863 |
subSearch=SearchData)</span></dt>
|
6889 |
subSearch=SearchData)</span></dt>
|
|
... |
|
... |
6912 |
</div>
|
6938 |
</div>
|
6913 |
</div>
|
6939 |
</div>
|
6914 |
</div>
|
6940 |
</div>
|
6915 |
|
6941 |
|
6916 |
<div class="variablelist">
|
6942 |
<div class="variablelist">
|
6917 |
<p class="title"><b>Methods</b></p>
|
|
|
6918 |
|
|
|
6919 |
<dl class="variablelist">
|
6943 |
<dl class="variablelist">
|
6920 |
<dt><span class=
|
6944 |
<dt><span class=
|
6921 |
"term">Extractor(doc)</span></dt>
|
6945 |
"term">Extractor(doc)</span></dt>
|
6922 |
|
6946 |
|
6923 |
<dd>An <code class="literal">Extractor</code>
|
6947 |
<dd>An <code class="literal">Extractor</code>
|