Release notes for Recoll 1.20.x
Caveats
Installing over an older version: 1.19
Installing 1.20 over an 1.19 index is possible, but there have been small changes in the way compound words (e.g. email addresses) are indexed, so it will be best to reset the index. Still, in a pinch, 1.20 search can mostly use an 1.19 index.
Case/diacritics sensitivity is off by default. It can be turned on only by editing recoll.conf ( see the manual). If you do so, you must then reset the index.
Always reset the index if you do not know by which version it
was created (you're not sure it's at least 1.18). The best method
is to quit all Recoll programs and delete the index directory
(
rm -rf ~/.recoll/xapiandb), then start recoll
or recollindex
.
recollindex -z will do the same
in most, but not all, cases. It's better to use
the rm method, which will also ensure that no debris
from older releases remain (e.g.: old stemming files which are
not used any more).
Minor releases at a glance
The rhythm of change in Recoll is slowing as the software is approaching maturity, so, in order to avoid stopping progress by excessive intervals between releases, the first versions of 1.20 will be allowed to contain some functional changes (as opposed to only bug fixes). There will be a freeze at some point.
Changes in Recoll 1.20.0p1
- An Open With entry was added to the result list and result table popup menus. This lets you choose an alternative application to open a document. The list of applications is built from the information inside the /usr/share/applications desktop files.
- A new way for specifying multiple terms to be searched inside a given field: it used to be that an entry lacking whitespace but splittable, like [term1,term2] was transformed into a phrase search, which made sense in some cases, but no so many. The code was changed so that [term1,term2] now means [term1 AND term2], and [term1/term2] means [term1 OR term2]. This is useful for field searches where you would previously be forced to repeat the field name for every term. [somefield:term1 somefield:term2] can now be expressed as [somefield:term1,term2].
- We changed the way terms are generated from a compound string (e.g. an email address). Previously, for an address like jfd@recoll.org, only the simple terms and the terms anchored at the start were generated (jfd, recoll, org, jfd@recoll, jfd@recoll.org). The new text splitter generates all the other possible terms (here, recoll.org only), so that it is now possible to search for left-truncated versions of the compound, e.g., all emails from a given domain.
- Recoll now indexes #hashtags as such.
- It is now possible to configure the GUI in wide form factor by dragging the toolbars to one of the sides (their location is remembered between sessions), and moving the category filters to a menu (can be set in the "Preferences->GUI configuration" panel).
- We added the indexedmimetypes and excludedmimetypes variables to the configuration GUI, which was also compacted a bit. A bunch of ininteresting variables were also removed.
- When indexing, we no longer add the top container
file name as a term for the contained sub-documents (if
any). This made no sense in most cases, as it meant that
you would get hits on all the sections from a chm or epub
when the top file name matched the search, when you
probably wanted only the parent document in this case.
However, the container file name was sometimes useful for filtering results, and it is still accessible, in a different way: the top container file name is added as a term to all the sub-documents, only for searching with a prefix. The field name is containerfilename, and no match on the subdocuments will occur if the field is not specified (this is different from previous filename processing, which was indexed as a general term. containerfilename is also set on files without sub-documents (e.g. a pdf). - A new attribute, pfxonly, was created to support the above change. This can be set on any metadata field inside the [prefixes] section of the fields file. The affected field terms will be indexed only with a prefix, so they will cause a hit only for a field search (the general behaviour is that field terms are indexed both prefixed and not, so they can also cause a hit when searched as general terms).
- A new [queryaliases] section was created in the fields, for definining field name aliases to be used only at query time (to avoid unwanted collection of data on random fields during indexing). The section is empty by default, but 2 obvious aliases are commented: filename=fn and containerfilename=cfn. Setting them in your personal file may save you some typing if you search on file names.
- You can now use both -e and -i for erasing then updating the index for the given file arguments with the same recollindex command.
- We now allow access to the Xapian docid for Recoll documents in recollq and Python API search results. This allows writing scripts which combine Recoll and pure Xapian operations. A sample Python program to find document duplicates, using MD5 terms was added. See src/python/samples/docdups.py
- The command used to identify the mime types of files when the internal method is file -i by default. It is now possible to customize this command by setting the systemfilecommand in the configuration. A suggested value would be xdg-mime, which sometimes works better than file.
- The result list has two new elements: %P substitution for printing the parent folder name, and an F link target which will open the parent folder in a file manager window.
- /media was added to the default skippedPaths list mostly as a reminder that blindly processing these with the general indexer is a bad idea (use separate indexes instead).
- recollq and recoll -t get a new option -N to print field names between values when -F is used. In addition, -F "" is taken as a directive to print all fields.
- Unicode hyphen (0x2010) is now translated to ASCII minus during indexing and searching. There is no good way to handle this character, given the varius misuses of minus and hyphen. This choice was deemed "less bad" than the previous one.