Release notes for Recoll 1.20.x
Caveats
Installing over an older version: 1.19
Installing 1.20 over an 1.19 index is possible, but there have been small changes in the way compound words (e.g. email addresses) are indexed, so it will be best to reset the index. Still, in a pinch, 1.20 search can mostly use an 1.19 index.
Always reset the index if you do not know by which version it
was created (you're not sure it's at least 1.18). The best method
is to quit all Recoll programs and delete the index directory
(
rm -rf ~/.recoll/xapiandb), then start recoll
or recollindex
.
recollindex -z will do the same
in most, but not all, cases. It's better to use
the rm method, which will also ensure that no debris
from older releases remain (e.g.: old stemming files which are
not used any more).
Case/diacritics sensitivity is off by default. It can be turned on only by editing recoll.conf ( see the manual). If you do so, you must then reset the index.
Minor releases at a glance
- 1.20.6 fixes some decompression issues with serious
performance and system load consequences in some cases
(depending on data): minimum checking that enough temp
space is available before uncompressing, no need to
uncompress tar.gz files. Also: rclscribus fixes, Danish
translation and fix messages in two places which were
not translated.
Uncompressing big files to /tmp for nothing (and re-doing it on the next indexing pass...) was, I believe the main remaining reason why Recoll indexing could cause system performance issues. - 1.20.5 is 1.20.4 with a few Qt 5 compatibility tweaks. Builds and runs with Qt 5.3.2, fails with 5.2.
- 1.20.4 has a fix to skip compress file system images like xxx.img.gz by default. This should have been in 1.20.3
- 1.20.3 has a very minor change to copy the Query Fragments Window config file from the default if it does not exist in the user config dir.
- 1.20.2 fixes a bug which prevented the real time indexer from indexing the web history queue (this was still processed when starting up). It also adds systray capability to the GUI.
Changes in Recoll 1.20.1
- An Open With entry was added to the result list and result table popup menus. This lets you choose an alternative application to open a document. The list of applications is built from the information inside the /usr/share/applications desktop files.
- A new way for specifying multiple terms to be searched inside a given field: it used to be that an entry lacking whitespace but splittable, like [term1,term2] was transformed into a phrase search, which made sense in some cases, but no so many. The code was changed so that [term1,term2] now means [term1 AND term2], and [term1/term2] means [term1 OR term2]. This is useful for field searches where you would previously be forced to repeat the field name for every term. [somefield:term1 somefield:term2] can now be expressed as [somefield:term1,term2].
- (1.20.1) The Query Fragments tool was added to the GUI. This is a window with customizable buttons to add arbitrary query language fragments to the current search. The buttons and fragments are defined in an xml file inside the recoll configuration directory ~/.recoll/fragbuts.xml. This makes it easy to define "pre-cooked" filters for things that you need repeatedly. See the manual for more details.
- We changed the way terms are generated from a compound string (e.g. an email address). Previously, for an address like jfd@recoll.org, only the simple terms and the terms anchored at the start were generated (jfd, recoll, org, jfd@recoll, jfd@recoll.org). The new text splitter generates all the other possible terms (here, recoll.org only), so that it is now possible to search for left-truncated versions of the compound, e.g., all emails from a given domain.
- (1.20.1) New keyboard accelerators for the result table: Ctrl+r switches the focus from the search entry to the table, Ctrl+o opens the document for the current line, Ctrl+Shift+o opens document and closes recoll, Ctrl+d previews the document.
- (1.20.1) A special term is now indexed for results from the web history: use "-rclbes:BGL" to exclude the web results, "rclbes:BGL" to restrict the results to the web ones. This is difficult to remember, but the Query Fragments feature means that you don't need to (this is in the sample Query Fragments file).
- Recoll now indexes #hashtags as such.
- It is now possible to configure the GUI in wide form factor by dragging the toolbars to one of the sides (their location is remembered between sessions), and moving the category filters to a menu (can be set in the "Preferences->GUI configuration" panel).
- We added the indexedmimetypes and excludedmimetypes variables to the configuration GUI, which was also compacted a bit. A bunch of ininteresting variables were also removed.
- When indexing, we no longer add the top container
file name as a term for the contained sub-documents (if
any). This made no sense in most cases, as it meant that
you would get hits on all the sections from a chm or epub
when the top file name matched the search, when you
probably wanted only the parent document in this case.
However, the container file name was sometimes useful for filtering results, and it is still accessible, in a different way: the top container file name is added as a term to all the sub-documents, only for searching with a prefix. The field name is containerfilename, and no match on the subdocuments will occur if the field is not specified (this is different from previous filename processing, which was indexed as a general term. containerfilename is also set on files without sub-documents (e.g. a pdf). - A new attribute, pfxonly, was created to support the above change. This can be set on any metadata field inside the [prefixes] section of the fields file. The affected field terms will be indexed only with a prefix, so they will cause a hit only for a field search (the general behaviour is that field terms are indexed both prefixed and not, so they can also cause a hit when searched as general terms).
- A new [queryaliases] section was created in the fields, for definining field name aliases to be used only at query time (to avoid unwanted collection of data on random fields during indexing). The section is empty by default, but 2 obvious aliases are commented: filename=fn and containerfilename=cfn. Setting them in your personal file may save you some typing if you search on file names.
- You can now use both -e and -i for erasing then updating the index for the given file arguments with the same recollindex command.
- We now allow access to the Xapian docid for Recoll documents in recollq and Python API search results. This allows writing scripts which combine Recoll and pure Xapian operations. A sample Python program to find document duplicates, using MD5 terms was added. See src/python/samples/docdups.py
- The command used to identify the mime types of files when the internal method fails used to be hard-coded as file -i. It is now possible to customize this command by setting the systemfilecommand in the configuration. A suggested value would be xdg-mime, which sometimes works better than file.
- The result list has two new elements: %P substitution for printing the parent folder name, and an F link target which will open the parent folder in a file manager window. e.g. <a href='F%N'>Open parent directory</a>
- /media was added to the default skippedPaths list mostly as a reminder that blindly processing these with the general indexer is a bad idea (use separate indexes instead).
- recollq and recoll -t get a new option -N to print field names between values when -F is used. In addition, -F "" is taken as a directive to print all fields.
- Unicode hyphen (0x2010) is now translated to ASCII minus during indexing and searching. There is no good way to handle this character, given the varius misuses of minus and hyphen. This choice was deemed "less bad" than the previous one.