Release notes for Recoll 1.19.x
Caveats
Installing over an older version: 1.19
Case/diacritics sensitivity is still off by default for this release. It can be turned on only by editing recoll.conf (see the manual). If you do so, you must then reset the index.
To be safe, always reset the index when upgrading to 1.19. There
was a persistent index corruption issue in 1.18
and earlier versions.
The simplest way to do this is to quit all Recoll
programs and just delete the index directory (
rm -rf ~/.recoll/xapiandb), then start
recoll
or recollindex
.
recollindex -z will do the same in most, but
not all, cases. It's better to use the rm method, which will also
ensure that no debris from older releases remain (e.g.: old stemming files
which are not used any more).
Installing 1.19 over an 1.18 index will force a lot of reindexing anyway because Recoll switched to using st_ctime instead of st_mtime to detect file modifications, meaning that all files which were modified since created will be updated.
Viewer exceptions: as in 1.18 (but we kept this section for 1.17 users), there is a list of mime types that should be opened with the locally configured application even when Use Desktop Preferences is checked. This allows making use of new functions (direct access to page), which could not be available through the desktop's xdg-open. The default list contains PDF, Postscript and DVI, which should be opened with the evince (or atril for Mint/MATE users) viewer for the page access functions to work. If you want to keep the previous behaviour (losing the page number functionality), you need to prune the list after installation . This can be done from the Preferences->Gui Configuration menu.
Minor releases at a glance
- 1.19.14p2 fixes another reference count issue in the Python module (a problem with the Query iterator put in evidence by the change in 1.19.14p1). It also changes the handling of diacritics for Bengali (accents are now unstripped, as for Hindi).
- 1.19.14p1 fixes a descriptor and memory leak in the Python module. The main library and programs are unchanged.
- 1.19.14 fixes relatively minor but ennoying issues in
indexing, plus a few other glitches:
- The use of a separate readonly Database object for querying the index while indexing would trigger Xapian errors, (bad block reads), and subsequent up-to-date check failures (leading to unnecessary reindexing). The jury is out as to the cause, but using the same object for reading and writing seems to eliminate the problem. This is linked to a Xapian ticket.
- An unnecessary log message in the child process between forking and executing the filter could block on a mutex, and lead to a 20 mn timeout for the affected father process thread (happened only in multithread mode).
- Also a possible overflow of the filter stack. This could only really happen in pathological situations (hand-crafted recursive zip file...).
- 1.19.13 hopefully fixes the rare but longstanding multithread indexing crashes, which I hope were actually due to the now corrected mismanagement of Xapian::Document objects. It also silences noisy but mostly harmless ppt-dump.py crashes.
- 1.19.12p2 fulfills an old promise that I had forgotten: have a double-click in the result table run an "open file" action. It just had waited for too long...
- 1.19.12p1 fixes the 1.19.12 install script which did not actually copy the xls filter...
- 1.19.12 adds a parameter for setting the truncation size of the stored metadata attributes, and a new XLS filter.
- 1.19.11 is a fix to the install script in 1.19.10. The latter did not copy the new ppt extraction code to the filters directory.
- 1.19.10 has a bit more changes than
usually goes into a Recoll minor version, and could have been
1.20.0 instead. On the other hand, it
brings some features which needed to be released, and did not
really warrant a major version. So here goes:
- Python3 compatibility for the Python Recoll module.
- A Ubuntu Unity Scope for saucy (13.10), replacing the lens (and which needed Python 3).
- A new PPT format text extractor. Catppt just did not extract anything from more recent .ppt files.
- 1.19.9 fixes a few significant bugs, mostly a very serious one about date filtering, and a quite common GUI crash.
- 1.19.8 changes the way we handle Hindi / Devanagari text (no more stripping of diacritics), and also has a fix for the results table dups and snippets links.
- 1.19.7 is 1.19.5 with a few build and packaging fixes. No need to update.
- 1.19.5 works around a Linux kernel bug that would make it impossible to index data from a network share mounted through cifs (this worked in 1.18 and stopped working in 1.19 because of its wider use of extended attributes)
- 1.19.4 has a German translation, and a few fixes for relatively ennoying bugs.
- 1.19.3 has more translations (Spanish, Russian, Czech), and a few minor bug and usability fixes.
- 1.19.2 fixes a bug in path translations for additional indexes.
- 1.19.1 was released 2 hours after 1.19.0 (book of records anyone?) because of a bug in the advanced search history feature which crashed the GUI as soon as a filename search was performed.
Changes in Recoll 1.19.0
- Indexing can use multiple threads. This can be a major performance boost for people with multiprocessor machines and big indexes. The threads setup is roughly auto-configured when recollindex starts, based on the number of processors, but it is also possible to taylor it in the configuration.There is a section in the manual to describe the configuration, and also some notes about the transformation and the performance improvements.
- There is a new result list/table popup menu option to display all the sub-documents for a given one. This is mostly useful to display the attachments to an email. The resulting screen can be used to select multiple entries and save them to files.
- It is now possible to use OR with "dir:" clauses, and wildcards have been enabled.
- When the option to follow symbolic links is not set -which is the default- symbolic links are now indexed as such (name and content).
- The advanced search panel now has a history feature. Use the up/down arrows to walk the search history list.
- There are new GUI configuration options to run in "search as you type" mode (which I don't find useful at all...), and to disable the Qt auto-completion inside the simple search string. The completion was often more confusing and ennoying than useful, especially because it is case-insensitive when case sometimes matter for Recoll searches (capitalization to disable stemming).
- When the option to collapse identical results is used, documents which do have duplicates are shown with a link to list the clones. This function needs new data from the index, so it will only completely work after a full 1.19 reindex.
- Recoll should now behave reasonably on video files: index the name and propose an Open button in the result list to start the configured player.
- Thanks to Recoll user Koniu, you can now access your Recoll indexes through a Web browser interface. The server side is based on the Bottle Python Web framework and the Recoll Python module, and can run self-contained (no necessity to run apache or another web server), so it's quite simple to set up. See: See the Recoll WebUI project on GitHub.
- Thanks to Recoll user David, there is now a filter to index and retrieve Lotus Notes messages. See the software site on sourceforge and some notes from a user with a slightly different configuration.
- There is a new path translation facility, with a GUI interface, to make it easier to share an index from a network share on clients on which the mount points might be different. This could also probably be put to use to design a "portable index" feature (for removable media).
- The first indexing run after Recoll installation (for a new user) will run in a fashion which will put data likely to be useful into the index faster, so that an impatient user can more quickly try searches.
- Implemented cache for last file uncompressed. This will much improve usage, e.g. for people fetching successive messages from a compressed mail folder.
- Recollindex will now change its current directory to a temporary one (e.g. /tmp) to mitigate the problems of some filters creating temporary files and not cleaning them.
- There is a new recursive explicit reindex option to the command line indexer.
- The default result list paragraph format has been slightly tweaked (removed the relevance percentage and small ordering and formatting changes).
- Mime type wildcard expansion is now performed against the index, not the configuration. This fixes many problems when searching for, e.g., media files indexed only by name.
- The choice for case/diacritics sensitivity is now fully processed during wildcard expansion (for case-sensitive indexes).
- The Snippets popup (list of pages and excerpts typically produced for PDF documents) can now use an external CSS stylesheet. This is useful because the Qt Webkit objects do not fully inherit the Qt configuration so that, for example, a style sheet is needed for using a different background color. The style sheet is chosen from the Preferences->GUI configuration->Result list panel.
- Improved handling of filters during indexing resulting in less subprocesses.
- Added function to import tags from external application (e.g. Tmsu).
- Changed format for rclaptg field. Was colon-separated, now uses normal
value/attributes syntax with an empty value like:
localfields = ; attr1 = val1 ; attr2 = val2
- Extended file attributes are now indexed by default. As a side effect, recoll now uses st_ctime, not st_mtime to detect file changes. This means that installing 1.19 will reindex many files (all those that were modified since created). Recoll also now processes the charset and mime_type standardized extended attributes.
- The Python module has been expanded to include the interface for extracting data. This means that you could now write most of the Recoll GUI in Python if you wished. There is a bit of sample code in the source package doing just this. A few incompatible changes had to be made to the Python module. Especially the "Query.next" field is gone and the module structure has been changed (different import statement needed). Adapting your code is trivial, have a look at the changes in the Unity Lens module for an example. The new module is compatible with the Python Database API Specification v2.0 for the parts that make sense for a non-relational DB.
- Recoll now uses a dynamic library for the code shared by the query interface, the indexer and the Python module. This should have no visible impact but was rendered necessary by the Python module evolutions.
- And quite a few Fixed bugs