Release notes for Recoll 1.18.x
Caveats
Installing over an older version: 1.18 introduces
significant index formats changes to support optional
character case and diacritics sensitivity, and it will be
advisable to reset the index in most cases. This will be best
done by destroying the index directory (rm -rf
~/.recoll/xapiandb).
If 1.18 is not configured for case and
diacritics sensitivity, it is mostly compatible with 1.17
indexes.
Case/diacritics sensitivity is off by default for this release. It can be turned on only by editing recoll.conf ( see the manual). If you do so, you must then reset the index.
Always reset the index if installing over an even older version (1.16 and older). The simplest way to do this is to quit all recoll programs and just delete the index directory (rm -rf ~/.recoll/xapiandb), then start recoll or recollindex. recollindex -z will do the same in most, but not all, cases.
The subdirectories of xapiandb which were previously used to store the stem expansion database (stem_english, stem_french...) are not used anymore, because the data is now stored in the Xapian synonyms table. They will stay around if you do nothing about them, so you may want to delete them if you have not chosen to just delete the whole index directory.
Viewer exceptions: There is a new list of mime types that should be opened with the locally configured application even when Use Desktop Preferences is checked. This allows making use of new functions (direct access to page), which could not be available through the desktop's xdg-open. The default list contains PDF, Postscript and DVI, which should be opened with the evince (or atril for Mint/MATE users) viewer for the page access functions to work. If you want to keep the previous behaviour (losing the page number functionality), you need to prune the list after installation . This can be done from the Preferences->Gui Configuration menu.
Changes
Recoll 1.18 has some major changes, the most visible of which is the ability to search for exact matches of character case and diacritics.
- The index can now be configured for case and diacritics sensitivity, in which case raw terms are indexed. On such an index, search insensitivity to case and diacriics is obtained, when desired, by query time expansion, in a similar manner to what is used for stemming. See the manual chapter for details about controlling the feature. The capacity for case/diacritics sensitivity is off by default, and you should not see differences in this respect after upgrading if you do not turn it on explicitely. Even on a raw index, most searches should behave like they did in 1.17. Sensitivity must be explicitely requested in most cases.
- The advanced search screen now has a history function. While the focus is in this window, you can walk the history of searches using the up and down arrows.
- Recoll has a new capacity to store page break locations and use them when opening a document at the location for a given match. It will also pass a search string to the viewer application. This currently works with PDF, Postscript and DVI documents, and, optimally, the evince viewer.
- The GUI result list has a new "snippets" window for documents with page numbers, which let the user choose a snippet and open the document at the appropriate page.
- There is a list of MIME types that should be opened with the locally configured application even when Use Desktop Preferences is checked. This will permit, for example, using evince for its page access capabilities on PDF files, while letting the desktop handle all the other mime types. The list is not empty by default, it contains PDF, Postscript and DVI, so you may want to reset it after installation if you want to keep the previous behaviour (losing the page number functionality). This can be done from the Preferences->Gui Configuration menu.
- We now allow multiple directory specifications in the query language, as in: dir:/home/me -dir:tmp
- The search inside the GUI preview window, has been improved, and allows selecting from a list one of the initial term groups as the search target.
- A new script dedicated to laptops, which can start or stop recollindex according to mains power status.
- Added <pre style="white-space: pre-wrap"> to plain text HTML display options. This will often be the best option to display plain text: it will better respect indentation, while folding long lines.
- When running in an UTF-8 locale, and after failing to decode a plain text file as UTF-8, indexing will try again using an 8 bit character set heuristically chosen according to the locale language code. This uses the LANG environment variable.
- On initial installation (when the ~/.recoll directory does not exist), recoll will install a list of characters which should not be stripped of diacritics, according to the detected national language (based on $LANG). There are currently specific lists for German (don't strip the umlauts), and Nordic languages (keep the letters with circle above in addition to the German list). Other languages currently only have exceptions which result in decomposing ligatures (fl, fi etc.). You can have a look at the standard recoll.conf in /usr/share/recoll/examples for more information.
- A new configuration variable, maxmemberkbs, has been implemented to limit the size of archive members we process. This will avoid trying to read a 4 GB ISO from a zip archive as happened in the past...
- Proper error reporting when a wildcard expansion is truncated for size. An incomplete search could previously be performed without any indication.
- More effort is also put in choosing the terms used in generating the snippets inside the result list.
- Recoll now uses the Xapian "synonyms" mechanism to store all data about stemming, case, and diacritics expansion (this replaces the previous ad-hoc stemming expansion mechanism).
- Partial autodetection of thunderbird mailboxes found out of the configured location.
- Fixed bugs:
- The unac_except_trans mechanism could be buggy in some cases and generate wrong character translations.
- Don't terminate monitor for permissions-related addwatch error.
- Fix handling of ODF documents exported by Google docs.
- It was previously impossible to open the parent of an embedded document (e.g. the CHM file for an HTML page inside the CHM) if the parent was itself a member of an archive.