Parent: [3d3f91] (diff)

Download this file

release-1.19.html    262 lines (247 with data), 14.4 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Recoll 1.19 series release notes</title>
<meta name="Author" content="Jean-Francois Dockes">
<meta name="Description"
content="recoll is a simple full-text search system for unix and linux based on the powerful and mature xapian engine">
<meta name="Keywords" content="full text search, desktop search, unix, linux">
<meta http-equiv="Content-language" content="en">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta name="robots" content="All,Index,Follow">
<link type="text/css" rel="stylesheet" href="styles/style.css">
</head>
<body>
<div class="rightlinks">
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="download.html">Downloads</a></li>
<li><a href="doc.html">Documentation</a></li>
</ul>
</div>
<div class="content">
<h1>Release notes for Recoll 1.19.x</h1>
<h2>Caveats</h2>
<p><em>Installing over an older version</em>: 1.19 </p>
<p>Case/diacritics sensitivity is still off by default for this release. It can
be turned on <em>only</em> by editing recoll.conf (<a
href="usermanual/usermanual.html#RCL.INDEXING.CONFIG.SENS">see the manual</a>).
If you do so, you must then reset the index.</p>
<p>To be safe, always reset the index when upgrading to 1.19. There
was a <a href="#rodb">persistent index corruption issue</a> in 1.18
and earlier versions.
The simplest way to do this is to quit all Recoll
programs and just delete the index directory (
<span class="literal">rm��-rf��~/.recoll/xapiandb</span>), then start
<code>recoll</code> or <code>recollindex</code>. <br>
<span class="literal">recollindex��-z</span> ��will do the same in most, but
not all, cases. It's better to use the <tt>rm</tt> method, which will also
ensure that no debris from older releases remain (e.g.: old stemming files
which are not used any more).</p>
<p>Installing 1.19 over an 1.18 index will force a lot of reindexing anyway
because Recoll switched to using <i>st_ctime</i> instead of <i>st_mtime</i> to
detect file modifications, meaning that all files which were modified since
created will be updated.</p>
<p><span class="important">Viewer exceptions</span>: as in 1.18 (but we kept
this section for 1.17 users), there is a list of mime types that should be
opened with the locally configured application even when <em>Use Desktop
Preferences</em> is checked. This allows making use of new functions (direct
access to page), which could not be available through the desktop's
<tt>xdg-open</tt>. The default list contains PDF, Postscript and DVI, which
should be opened with the <em>evince</em> (or <em>atril</em> for Mint/MATE
users) viewer for the page access functions to work. If you want to keep the
previous behaviour (losing the page number functionality), you need to prune
the list after installation . This can be done from the <em>Preferences-&gt;Gui
Configuration</em> menu.</p>
<h2><a name="minor_releases">Minor releases at a glance</a></h2>
<ul>
<li>1.19.14p2 fixes another reference count issue in the Python
module (a problem with the Query iterator put in evidence by the
change in 1.19.14p1). It also changes the handling of diacritics
for Bengali (accents are now unstripped, as for Hindi).</li>
<li>1.19.14p1 fixes a descriptor and memory leak in the Python
module. The main library and programs are unchanged.</li>
<li>1.19.14 fixes relatively minor but ennoying issues in
indexing, plus a few other glitches:
<ul>
<li id="rodb" class="important">The use of a separate readonly
Database object
for querying the index while indexing would trigger Xapian
errors, (bad block reads), and subsequent up-to-date check
failures (leading to unnecessary reindexing). The jury is out
as to the cause, but using the same object for reading and
writing seems to eliminate the problem. This is linked to
a <a href="http://trac.xapian.org/ticket/645">Xapian
ticket</a>.</li>
<li>An unnecessary log message in the child process between
forking and executing the filter could block on a mutex, and
lead to a 20 mn timeout for the affected father process thread
(happened only in multithread mode).</li>
<li>Also a possible overflow of the filter stack. This could only
really happen in pathological situations (hand-crafted recursive
zip file...).</li>
</ul>
</li>
<li>1.19.13 hopefully fixes the rare but longstanding multithread
indexing crashes, which I hope were actually due to the now
corrected mismanagement of Xapian::Document objects. It also
silences noisy but mostly harmless ppt-dump.py crashes.</li>
<li>1.19.12p2 fulfills an old promise that I had forgotten: have a
double-click in the result table run an "open file" action. It just
had waited for too long...</li>
<li>1.19.12p1 fixes the 1.19.12 install script which did not
actually copy the xls filter...</li>
<li>1.19.12 adds a parameter for setting the truncation size of the
stored metadata attributes, and a new XLS filter.</li>
<li>1.19.11 is a fix to the install script in 1.19.10. The latter
did not copy the new ppt extraction code to the filters directory.</li>
<li>1.19.10 has a bit more changes than
usually goes into a Recoll minor version, and could have been
1.20.0 instead. On the other hand, it
brings some features which needed to be released, and did not
really warrant a major version. So here goes:
<ul>
<li>Python3 compatibility for the Python Recoll module.</li>
<li>A Ubuntu Unity Scope for saucy (13.10), replacing the
lens (and which needed Python 3).</li>
<li>A new PPT format text extractor. Catppt just did not extract
anything from more recent .ppt files.</li>
</ul> Mostly, if you are not running Ubuntu (Saucy or later), you
can just <a href="filters/filters.html">download the ppt
filter</a> and stay with 1.19.9.
</li> <li>1.19.9 fixes a few
significant <a href="BUGS.html#b_1_19_8">bugs</a>, mostly a very
serious one about date filtering, and a quite common GUI
crash.</li>
<li>1.19.8 changes the way we handle Hindi / Devanagari
text (no more stripping of diacritics), and also has a fix for the
results table dups and snippets links.</li>
<li>1.19.7 is 1.19.5 with a few build and packaging fixes. No need
to update.</li>
<li>1.19.5 works around a Linux kernel bug that would make it
impossible to index data from a network share mounted through cifs
(this worked in 1.18 and stopped working in 1.19 because of its
wider use of extended attributes)</li>
<li>1.19.4 has a German translation, and a few fixes
for <a href="BUGS.html#b_1_19_3">relatively ennoying
bugs</a>.</li>
<li>1.19.3 has more translations (Spanish, Russian, Czech), and a few
minor bug and usability fixes.</li>
<li>1.19.2 fixes a bug in path translations for additional indexes.</li>
<li>1.19.1 was released 2 hours after 1.19.0 (book of records anyone?)
because of a bug in the advanced search history feature which crashed
the GUI as soon as a <i>filename</i> search was performed.</li>
</ul>
<h2>Changes in Recoll 1.19.0</h2>
<ul>
<li>Indexing can use multiple threads. This can be a major performance boost
for people with multiprocessor machines and big indexes. The threads setup
is roughly auto-configured when recollindex starts, based on the number of
processors, but it is also possible to taylor it in the configuration.There
is a <a
href="http://www.recoll.org/usermanual/usermanual.html#RCL.INSTALL.CONFIG.RECOLLCONF.IDXTHREADS">section
in the manual</a> to describe the configuration, and also some <a
href="http://www.recoll.org/idxthreads/threadingRecoll.html">notes about
the transformation and the performance improvements</a>. </li>
<li>There is a new result list/table popup menu option to display all the
sub-documents for a given one. This is mostly useful to display the
attachments to an email. The resulting screen can be used to select
multiple entries and save them to files.</li>
<li>It is now possible to use OR with "dir:" clauses, and wildcards have been
enabled.</li>
<li>When the option to follow symbolic links is not set -which is the
default- symbolic links are now indexed as such (name and
content).</li>
<li>The advanced search panel now has a history feature. Use the
up/down arrows to walk the search history list.</li>
<li>There are new GUI configuration options to run in "search as you type"
mode (which I don't find useful at all...), and to disable the Qt
auto-completion inside the simple search string. The completion was often
more confusing and ennoying than useful, especially because it is
case-insensitive when case sometimes matter for Recoll searches
(capitalization to disable stemming).</li>
<li>When the option to collapse identical results is used, documents which do
have duplicates are shown with a link to list the clones. This function
needs new data from the index, so it will only completely work after a full
1.19 reindex.</li>
<li>Recoll should now behave reasonably on video files: index the name and
propose an Open button in the result list to start the configured
player.</li>
<li>Thanks to Recoll user <a href="https://github.com/koniu">Koniu</a>, you
can now access your Recoll indexes through a Web browser interface. The
server side is based on the <a href="http://bottlepy.org/docs/dev/">Bottle
Python Web framework</a> and the Recoll Python module, and can run
self-contained (no necessity to run apache or another web server), so it's
quite simple to set up. See: See the <a
href="https://github.com/koniu/recoll-webui/">Recoll WebUI project</a> on
GitHub. </li>
<li>Thanks to Recoll user David, there is now a filter to index and retrieve
<b>Lotus Notes</b> messages. See the software <a
href="http://sourceforge.net/projects/rcollnotesfiltr/">site on
sourceforge</a> and some <a
href="http://richardappleby.wordpress.com/2013/04/11/you-dont-have-to-know-the-answer-to-everything-just-how-to-find-it/">notes</a>
from a user with a slightly different configuration.</li>
<li>There is a new path translation facility, with a GUI interface, to make
it easier to share an index from a network share on clients on which the
mount points might be different. This could also probably be put to use to
design a "portable index" feature (for removable media).</li>
<li>The first indexing run after Recoll installation (for a new user) will
run in a fashion which will put data likely to be useful into the index
faster, so that an impatient user can more quickly try searches.</li>
<li>Implemented cache for last file uncompressed. This will much improve
usage, e.g. for people fetching successive messages from a compressed mail
folder.</li>
<li>Recollindex will now change its current directory to a temporary one
(e.g. /tmp) to mitigate the problems of some filters creating temporary
files and not cleaning them.</li>
<li>There is a new recursive explicit reindex option to the command line
indexer.</li>
<li>The default result list paragraph format has been slightly tweaked
(removed the relevance percentage and small ordering and formatting
changes).</li>
<li>Mime type wildcard expansion is now performed against the index, not the
configuration. This fixes many problems when searching for, e.g., media
files indexed only by name.</li>
<li>The choice for case/diacritics sensitivity is now fully processed during
wildcard expansion (for case-sensitive indexes).</li>
<li>The Snippets popup (list of pages and excerpts typically produced for PDF
documents) can now use an external CSS stylesheet. This is useful because
the Qt Webkit objects do not fully inherit the Qt configuration so that,
for example, a style sheet is needed for using a different background
color. The style sheet is chosen from the <tt>Preferences-&gt;GUI
configuration-&gt;Result list</tt> panel.</li>
<li>Improved handling of filters during indexing resulting in less
subprocesses.</li>
<li>Added function to import tags from external application (e.g. Tmsu).</li>
<li>Changed format for rclaptg field. Was colon-separated, now uses normal
value/attributes syntax with an empty value like:
<pre> localfields = ; attr1 = val1 ; attr2 = val2
</pre>
</li>
<li>Extended file attributes are now indexed by default. As a side effect,
recoll now uses st_ctime, not st_mtime to detect file changes. This means
that installing 1.19 will reindex many files (all those that were modified
since created). Recoll also now processes the <tt>charset</tt> and
<tt>mime_type</tt> standardized extended attributes.</li>
<li>The Python module has been expanded to include the interface for
extracting data. This means that you could now write most of the Recoll GUI
in Python if you wished. There is a <a
href="https://bitbucket.org/medoc/recoll/src/5b4bd9ef26a1/src/python/samples/recollgui/qrecoll.py?at=default">bit
of sample code</a> in the source package doing just this. A few
incompatible changes had to be made to the Python module. Especially the
"Query.next" field is gone and the module structure has been changed
(different import statement needed). Adapting your code is trivial, have a
look at the changes in the <a
href="https://bitbucket.org/medoc/recoll/src/5b4bd9ef26a10912bf8bd833fe6c084bd5a7bdbd/src/desktop/unity-lens-recoll/recollscope/rclsearch.py?at=default">Unity
Lens module</a> for an example. The new module is compatible with the <a
href="http://www.python.org/dev/peps/pep-0249/">Python Database API
Specification v2.0</a> for the parts that make sense for a non-relational
DB.</li>
<li>Recoll now uses a dynamic library for the code shared by the query
interface, the indexer and the Python module. This should have no visible
impact but was rendered necessary by the Python module evolutions.</li>
<li>And quite a few <a href="BUGS.html#b_1_18_2">Fixed bugs</a></li>
</ul>
</div>
</body>
</html>