Release notes for Recoll 1.20.x
Caveats
Installing over an older version: 1.19
Installing 1.20 over an 1.19 index is possible, but there have been small changes in the way compound words (e.g. email addresses) are indexed, so it will be best to reset the index. Still, in a pinch, 1.20 search can mostly use an 1.19 index.
Case/diacritics sensitivity is off by default. It can be turned on only by editing recoll.conf ( see the manual). If you do so, you must then reset the index.
Always reset the index if you do not know by which version it
was created (you're not sure it's at least 1.18). The best method
is to quit all Recoll programs and delete the index directory
(
rm -rf ~/.recoll/xapiandb), then start recoll
or recollindex
.
recollindex -z will do the same
in most, but not all, cases. It's better to use
the rm method, which will also ensure that no debris
from older releases remain (e.g.: old stemming files which are
not used any more).
Minor releases at a glance
The rhythm of change in Recoll is slowing as the software is approaching maturity, so, in order to avoid stopping progress by excessive intervals between releases, the first versions of 1.20 will be allowed to contain some functional changes (as opposed to only bug fixes). There will be a freeze at some point.
Changes in Recoll 1.20.0
- An Open With entry was added to the result list and result table popup menus. This lets you choose an alternative application to open a document. The list of applications is built from the information inside the /usr/share/applications desktop files.
- A new way for specifying multiple terms to be searched inside a given field: it used to be that an entry lacking whitespace but splittable, like [term1,term2] was transformed into a phrase search, which made sense in some cases, but no so many. The code was changed so that [term1,term2] now means [term1 AND term2], and [term1/term2] means [term1 OR term2]. This is useful for field searches where you would previously be forced to repeat the field name for every term. [somefield:term1 somefield:term2] can now be expressed as [somefield:term1,term2].
- We changed the way terms are generated from a compound string (e.g. an email address). Previously, for an address like jfd@recoll.org, only the simple terms and the terms anchored at the start were generated (jfd, recoll, org, jfd@recoll, jfd@recoll.org). The new text splitter generates all the other possible terms (here, recoll.org only), so that it is now possible to search for left-truncated versions of the compound, e.g., all emails from a given domain.
- It is now possible to configure the GUI in wide form factor by dragging the toolbars to one of the sides (their location is remembered between sessions), and moving the category filters to a menu (can be set in the "Preferences->GUI configuration" panel).
- We added the indexedmimetypes and excludedmimetypes variables to the configuration GUI, which was also compacted a bit. A bunch of ininteresting variables were also removed.
- When indexing, we no longer add the top container file-name as a term for the contained sub-documents (if any). This made no sense at all in most cases. However, this was sometimes useful when searching email folders. Complain if you do not like this change, and I'll make it configurable.
- You can now use both -e and -i for erasing then updating the index for the given file arguments with the same recollindex command.
- We now allow access to the Xapian docid for Recoll documents in recollq and Python API search results. This allows writing scripts which combine Recoll and pure Xapian operations. A sample Python program to find document duplicates, using MD5 terms was added. See src/python/samples/docdups.py
- /media was added to the default skippedPaths list mostly as a reminder that blindly processing these with the general indexer is a bad idea (use separate indexes instead).
- recollq and recoll -t get a new option -N to print field names between values when -F is used. In addition, -F "" is taken as a directive to print all fields.
- Unicode hyphen (0x2010) is now translated to ASCII minus during indexing and searching. There is no good way to handle this character, given the varius misuses of minus and hyphen. This choice was deemed "less bad" than the previous one.