Parent: [6f17ef] (diff)

Child: [34e1a2] (diff)

Download this file

release-1.20.html    236 lines (198 with data), 11.5 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Recoll 1.20 series release notes</title>
<meta name="Author" content="Jean-Francois Dockes">
<meta name="Description"
content="recoll is a simple full-text search system for unix and linux based on the powerful and mature xapian engine">
<meta name="Keywords" content="full text search, desktop search, unix, linux">
<meta http-equiv="Content-language" content="en">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta name="robots" content="All,Index,Follow">
<link type="text/css" rel="stylesheet" href="styles/style.css">
</head>
<body>
<div class="rightlinks">
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="download.html">Downloads</a></li>
<li><a href="doc.html">Documentation</a></li>
</ul>
</div>
<div class="content">
<h1>Release notes for Recoll 1.20.x</h1>
<h2>Caveats</h2>
<p><em>Installing over an older version</em>: 1.19 </p>
<p>Installing 1.20 over an 1.19 index is possible, but there
have been small changes in the way compound words (e.g. email
addresses) are indexed, so it will be best to reset the
index. Still, in a pinch, 1.20 search can mostly use an 1.19
index.</p>
<p>Always reset the index if you do not know by which version it
was created (you're not sure it's at least 1.18). The best method
is to quit all Recoll programs and delete the index directory
(<span class="literal">
rm��-rf��~/.recoll/xapiandb</span>), then start <code>recoll</code>
or <code>recollindex</code>. <br>
<span class="literal">recollindex��-z</span> will do the same
in most, but not all, cases. It's better to use
the <tt>rm</tt> method, which will also ensure that no debris
from older releases remain (e.g.: old stemming files which are
not used any more).</p>
<p>Case/diacritics sensitivity is off by default. It can be
turned on <em>only</em> by editing
recoll.conf (
<a href="usermanual/usermanual.html#RCL.INDEXING.CONFIG.SENS">
see the manual</a>). If you do so, you must then reset the
index.</p>
<h2><a name="minor_releases">Minor releases at a glance</a></h2>
<ul>
<li>1.20.2 fixes a bug which prevented the real time indexer
from indexing the web history queue (this was still processed
when starting up).</li>
</ul>
<h2>Changes in Recoll 1.20.1</h2>
<ul>
<li>An <em>Open With</em> entry was added to the result list
and result table popup menus. This lets you choose an
alternative application to open a document. The list of
applications is built from the information inside
the <span class="filename">
/usr/share/applications</span> desktop files.</li>
<li>A new way for specifying multiple terms to be searched
inside a given field: it used to be that an entry lacking
whitespace but splittable, like [term1,term2] was
transformed into a phrase search, which made sense in some
cases, but no so many. The code was changed so that
[term1,term2] now means [term1&nbsp;AND&nbsp;term2], and
[term1/term2] means [term1&nbsp;OR&nbsp;term2]. This is
useful for field searches where you would previously be
forced to repeat the field name for every term.
[somefield:term1&nbsp;somefield:term2] can now be expressed as
[somefield:term1,term2].
</li>
<li>(1.20.1) The <b>Query Fragments</b> tool was added to
the GUI. This is a window with customizable buttons to add
arbitrary query language fragments to the current
search. The buttons and fragments are defined in an xml
file inside the recoll configuration
directory <span class="filename">~/.recoll/fragbuts.xml</span>. This
makes it easy to define "pre-cooked" filters for things
that you need repeatedly.
<a href="usermanual/usermanual.html#RCL.SEARCH.GUI.FRAGBUTS">
See the manual</a> for more details.</li>
<li>We changed the way terms are generated from a compound
string (e.g. an email address). Previously, for an address
like <em>jfd@recoll.org</em>, only the simple terms and
the terms anchored at the start were generated
(<em>jfd</em>, <em>recoll</em>, <em>org</em>, <em>jfd@recoll</em>, <em>jfd@recoll.org</em>). The
new text splitter generates all the other possible terms
(here, <em>recoll.org</em> only), so that it is now
possible to search for left-truncated versions of the
compound, e.g., all emails from a given domain.</li>
<li>(1.20.1) New keyboard accelerators for the result table: Ctrl+r
switches the focus from the search entry to the table,
Ctrl+o opens the document for the current line, Ctrl+Shift+o
opens document and closes recoll, Ctrl+d previews the
document.</li>
<li>(1.20.1) A special term is now indexed for results from the web
history: use "-rclbes:BGL" to exclude the web results,
"rclbes:BGL" to restrict the results to the web ones. This
is difficult to remember, but the Query Fragments feature
means that you don't need to (this is in the sample Query
Fragments file).</li>
<li>Recoll now indexes <em>#hashtags</em> as such.</li>
<li>It is now possible to configure the GUI in wide form
factor by dragging the toolbars to one of the sides (their
location is remembered between sessions), and moving the
category filters to a menu (can be set in the
"Preferences->GUI configuration" panel).</li>
<li>We added the <em>indexedmimetypes</em> and
<em>excludedmimetypes</em> variables to the configuration
GUI, which was also compacted a bit. A bunch of
ininteresting variables were also removed.</li>
<li>When indexing, we no longer add the top container
file name as a term for the contained sub-documents (if
any). This made no sense in most cases, as it meant that
you would get hits on all the sections from a chm or epub
when the top file name matched the search, when you
probably wanted only the parent document in this case.<br>
However, the container file name was sometimes useful for
filtering results, and it is still accessible, in a
different way: the top container file name is added as a
term to all the sub-documents, <em>only for searching with
a prefix</em>. The field name
is <span class="literal">containerfilename</span>, and no
match on the subdocuments will occur if the field is not
specified (this is different from
previous <span class="literal">filename</span> processing,
which was indexed as a general
term. <span class="literal">containerfilename</span> is
also set on files without sub-documents (e.g. a pdf).</li>
<li>A new attribute, <span class="literal">pfxonly</span>,
was created to support the above change. This can be set
on any metadata field inside
the <span class="literal">[prefixes]</span> section of
the <span class="filename">fields</span> file. The
affected field terms will be indexed <em>only with a
prefix</em>, so they will cause a hit only for a field
search (the general behaviour is that field terms are
indexed both prefixed and not, so they can also cause a
hit when searched as general terms).</li>
<li>A new <span class="literal">[queryaliases]</span>
section was created in
the <span class="filename">fields</span>, for definining
field name aliases to be used only at query time (to avoid
unwanted collection of data on random fields during
indexing). The section is empty by default, but 2 obvious
aliases are commented: <span class="literal">filename=fn</span>
and <span class="literal">containerfilename=cfn</span>. Setting
them in your personal file may save you some typing if you
search on file names.</li>
<li>You can now use both <em>-e</em> and <em>-i</em> for
erasing then updating the index for the given file
arguments with the same <em>recollindex</em> command.</li>
<li>We now allow access to the Xapian docid for Recoll
documents in <span class="command">recollq</span> and
Python API search results. This allows writing scripts
which combine Recoll and pure Xapian operations. A sample
Python program to find document duplicates, using MD5
terms was added. See
<span class="filename">src/python/samples/docdups.py</span></li>
<li>The command used to identify the mime types of files
when the internal method fails used to be hard-coded
as <span class="literal">file -i</span>. It is now
possible to customize this command by setting
the <span class="literal">systemfilecommand</span> in the
configuration. A suggested value would
be <span class="filename">xdg-mime</span>, which sometimes
works better than <span class="filename">file</span>.</li>
<li>The result list has two new elements: %P substitution
for printing the parent folder name, and an <tt>F</tt>
link target which will open the parent folder in a
file manager window. e.g. <tt>&lt;a&nbsp;href='F%N'&gt;Open parent directory&lt;/a&gt;</tt>
</li>
<li><span class="filename">/media</span> was added to the default
skippedPaths list mostly as a reminder that blindly
processing these with the general indexer is a bad idea
(use separate indexes instead).</li>
<li><span class="command">recollq</span>
and <span class="command">recoll&nbsp;-t</span> get a new
option <span class="literal">-N</span> to print field
names between values when
<span class="literal">-F</span> is used. In addition,
<span class="literal">-F&nbsp;""</span> is taken as a
directive to print all fields.</li>
<li>Unicode <span class="literal">hyphen</span> (0x2010) is
now translated to ASCII
<span class="literal">minus</span>
during indexing and searching. There is no good way to
handle this character, given the varius misuses of minus
and hyphen. This choice was deemed "less bad" than the
previous one.</li>
<!-- <li>The purple filter (for pidgin and other messaging apps)
has been updated for newer log formats.</li> -->
</ul>
</div>
</body>
</html>