recoll / Code / [e35229] /website/release-1.19.html

[e35229]: website / release-1.19.html History

release-1.19.html 264 lines (247 with data), 14.5 kB

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <title>Recoll 1.19 series release notes</title>
  <meta name="Author" content="Jean-Francois Dockes">
  <meta name="Description"
  content="recoll is a simple full-text search system for unix and linux     based on the powerful and mature xapian engine">
  <meta name="Keywords" content="full text search, desktop search, unix, linux">
  <meta http-equiv="Content-language" content="en">
  <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <meta name="robots" content="All,Index,Follow">
  <link type="text/css" rel="stylesheet" href="styles/style.css">
</head>

<body>

<div class="rightlinks">
<ul>
  <li><a href="index.html">Home</a></li>
  <li><a href="download.html">Downloads</a></li>
  <li><a href="doc.html">Documentation</a></li>
</ul>
</div>

<div class="content">
<h1>Release notes for Recoll 1.19.x</h1>

<h2>Caveats</h2>

<p><em>Installing over an older version</em>: 1.19 </p>

<p>Case/diacritics sensitivity is still off by default for this release. It can
be turned on <em>only</em> by editing recoll.conf (<a
href="usermanual/usermanual.html#RCL.INDEXING.CONFIG.SENS">see the manual</a>).
If you do so, you must then reset the index.</p>

<p>To be safe, always reset the index when upgrading to 1.19. There
  was a <a href="#rodb">persistent index corruption issue</a> in 1.18
  and earlier versions.
  The simplest way to do this is to quit all Recoll
  programs and just delete the index directory (
  <span class="literal">rm��-rf��~/.recoll/xapiandb</span>), then start
  <code>recoll</code> or <code>recollindex</code>. <br>
  <span class="literal">recollindex��-z</span> ��will do the same in most, but
  not all, cases. It's better to use the <tt>rm</tt> method, which will also
  ensure that no debris from older releases remain (e.g.: old stemming files
  which are not used any more).</p>

<p>Installing 1.19 over an 1.18 index will force a lot of reindexing anyway
because Recoll switched to using <i>st_ctime</i> instead of <i>st_mtime</i> to
detect file modifications, meaning that all files which were modified since
created will be updated.</p>

<p><span class="important">Viewer exceptions</span>: as in 1.18 (but we kept
this section for 1.17 users), there is a list of mime types that should be
opened with the locally configured application even when <em>Use Desktop
Preferences</em> is checked. This allows making use of new functions (direct
access to page), which could not be available through the desktop's
<tt>xdg-open</tt>. The default list contains PDF, Postscript and DVI, which
should be opened with the <em>evince</em> (or <em>atril</em> for Mint/MATE
users) viewer for the page access functions to work. If you want to keep the
previous behaviour (losing the page number functionality), you need to prune
the list after installation . This can be done from the <em>Preferences-&gt;Gui
Configuration</em> menu.</p>

<h2><a name="minor_releases">Minor releases at a glance</a></h2>
<ul>
  <li>1.19.14p2 fixes another reference count issue in the Python
    module (a problem with the Query iterator put in evidence by the
    change in 1.19.14p1). It also changes the handling of diacritics
    for Bengali (accents are now unstripped, as for Hindi).</li>

  <li>1.19.14p1 fixes a descriptor and memory leak in the Python
    module. The main library and programs are unchanged.</li>

  <li>1.19.14 fixes relatively minor but ennoying issues in
    indexing, plus a few other glitches:
    <ul>
      <li id="rodb" class="important">The use of a separate readonly
        Database object 
        for querying the index while indexing would trigger Xapian
        errors, (bad block reads), and subsequent up-to-date check
        failures (leading to unnecessary reindexing). The jury is out
        as to the cause, but using the same object for reading and
        writing seems to eliminate the problem. This is linked to
        a <a href="http://trac.xapian.org/ticket/645">Xapian
        ticket</a>.</li>
      <li>An unnecessary log message in the child process between
        forking and executing the filter could block on a mutex, and
        lead to a 20 mn timeout for the affected father process thread
        (happened only in multithread mode).</li>
      <li>Also a possible overflow of the filter stack. This could only
        really happen in pathological situations (hand-crafted recursive
        zip file...).</li>
    </ul>
  </li>

  <li>1.19.13 hopefully fixes the rare but longstanding multithread
    indexing crashes, which I hope were actually due to the now
    corrected mismanagement of Xapian::Document objects. It also
    silences noisy but mostly harmless ppt-dump.py crashes.</li>
  <li>1.19.12p2 fulfills an old promise that I had forgotten: have a
    double-click in the result table run an "open file" action. It just
    had waited for too long...</li>
  <li>1.19.12p1 fixes the 1.19.12 install script which did not
    actually copy the xls filter...</li>
  <li>1.19.12 adds a parameter for setting the truncation size of the
    stored metadata attributes, and a new XLS filter.</li>
  <li>1.19.11 is a fix to the install script in 1.19.10. The latter
    did not copy the new ppt extraction code to the filters directory.</li>
  <li>1.19.10 has a bit more changes than
    usually goes into a Recoll minor version, and could have been
    1.20.0 instead. On the other hand, it
    brings some features which needed to be released, and did not
    really warrant a major version. So here goes:
    <ul>
      <li>Python3 compatibility for the Python Recoll module.</li>
      <li>A Ubuntu Unity Scope for saucy (13.10), replacing the
        lens (and which needed Python 3).</li>
      <li>A new PPT format text extractor. Catppt just did not extract
        anything from more recent .ppt files.</li>
    </ul> Mostly, if you are not running Ubuntu (Saucy or later), you
    can just <a href="filters/filters.html">download the ppt
    filter</a> and stay with 1.19.9. 
  </li> <li>1.19.9 fixes a few
    significant <a href="BUGS.html#b_1_19_8">bugs</a>, mostly a very
    serious one about date filtering, and a quite common GUI
    crash.</li>
  <li>1.19.8 changes the way we handle Hindi / Devanagari
    text (no more stripping of diacritics), and also has a fix for the
    results table dups and snippets links.</li>
  <li>1.19.7 is 1.19.5 with a few build and packaging fixes. No need
    to update.</li>
  <li>1.19.5 works around a Linux kernel bug that would make it
    impossible to index data from a network share mounted through cifs
    (this worked in 1.18 and stopped working in 1.19 because of its
    wider use of extended attributes)</li>
  <li>1.19.4 has a German translation, and a few fixes
    for <a href="BUGS.html#b_1_19_3">relatively ennoying
    bugs</a>.</li> 
  <li>1.19.3 has more translations (Spanish, Russian, Czech), and a few
    minor bug and usability fixes.</li>
  <li>1.19.2 fixes a bug in path translations for additional indexes.</li>
  <li>1.19.1 was released 2 hours after 1.19.0 (book of records anyone?)
    because of a bug in the advanced search history feature which crashed
    the GUI as soon as a <i>filename</i> search was performed.</li>
</ul>

<h2>Changes in Recoll 1.19.0</h2>
<ul>
  <li>Indexing can use multiple threads. This can be a major performance boost
    for people with multiprocessor machines and big indexes. The threads setup
    is roughly auto-configured when recollindex starts, based on the number of
    processors, but it is also possible to taylor it in the configuration.There
    is a <a
    href="http://www.recoll.org/usermanual/usermanual.html#RCL.INSTALL.CONFIG.RECOLLCONF.IDXTHREADS">section
    in the manual</a> to describe the configuration, and also some <a
    href="http://www.recoll.org/idxthreads/threadingRecoll.html">notes about
    the transformation and the performance improvements</a>. </li>
  <li>There is a new result list/table popup menu option to display all the
    sub-documents for a given one. This is mostly useful to display the
    attachments to an email. The resulting screen can be used to select
    multiple entries and save them to files.</li>
  <li>It is now possible to use OR with "dir:" clauses, and wildcards have been
    enabled.</li>
  <li>When the option to follow symbolic links is not set -which is the
    default- symbolic links are now indexed as such (name and
    content).</li>
  <li>The advanced search panel now has a history feature. Use the
    up/down arrows to walk the search history list.</li>
  <li>There are new GUI configuration options to run in "search as you type"
    mode (which I don't find useful at all...), and to disable the Qt
    auto-completion inside the simple search string. The completion was often
    more confusing and ennoying than useful, especially because it is
    case-insensitive when case sometimes matter for Recoll searches
    (capitalization to disable stemming).</li>
  <li>When the option to collapse identical results is used, documents which do
    have duplicates are shown with a link to list the clones. This function
    needs new data from the index, so it will only completely work after a full
    1.19 reindex.</li>
  <li>Recoll should now behave reasonably on video files: index the name and
    propose an Open button in the result list to start the configured
  player.</li>
  <li>Thanks to Recoll user <a href="https://github.com/koniu">Koniu</a>, you
    can now access your Recoll indexes through a Web browser interface. The
    server side is based on the <a href="http://bottlepy.org/docs/dev/">Bottle
    Python Web framework</a> and the Recoll Python module, and can run
    self-contained (no necessity to run apache or another web server), so it's
    quite simple to set up. See: See the <a
    href="https://github.com/koniu/recoll-webui/">Recoll WebUI project</a> on
    GitHub. </li>
  <li>Thanks to Recoll user David, there is now a filter to index and retrieve
    <b>Lotus Notes</b> messages. See the software <a
    href="http://sourceforge.net/projects/rcollnotesfiltr/">site on
    sourceforge</a> and some <a
    href="http://richardappleby.wordpress.com/2013/04/11/you-dont-have-to-know-the-answer-to-everything-just-how-to-find-it/">notes</a>
    from a user with a slightly different configuration.</li>
  <li>There is a new path translation facility, with a GUI interface, to make
    it easier to share an index from a network share on clients on which the
    mount points might be different. This could also probably be put to use to
    design a "portable index" feature (for removable media).</li>
  <li>The first indexing run after Recoll installation (for a new user) will
    run in a fashion which will put data likely to be useful into the index
    faster, so that an impatient user can more quickly try searches.</li>
  <li>Implemented cache for last file uncompressed. This will much improve
    usage, e.g. for people fetching successive messages from a compressed mail
    folder.</li>
  <li>Recollindex will now change its current directory to a temporary one
    (e.g. /tmp) to mitigate the problems of some filters creating temporary
    files and not cleaning them.</li>
  <li>There is a new recursive explicit reindex option to the command line
    indexer.</li>
  <li>The default result list paragraph format has been slightly tweaked
    (removed the relevance percentage and small ordering and formatting
    changes).</li>
  <li>Mime type wildcard expansion is now performed against the index, not the
    configuration. This fixes many problems when searching for, e.g., media
    files indexed only by name.</li>
  <li>The choice for case/diacritics sensitivity is now fully processed during
    wildcard expansion (for case-sensitive indexes).</li>
  <li>The Snippets popup (list of pages and excerpts typically produced for PDF
    documents) can now use an external CSS stylesheet. This is useful because
    the Qt Webkit objects do not fully inherit the Qt configuration so that,
    for example, a style sheet is needed for using a different background
    color. The style sheet is chosen from the <tt>Preferences-&gt;GUI
    configuration-&gt;Result list</tt> panel.</li>
  <li>Improved handling of filters during indexing resulting in less
    subprocesses.</li>
  <li>Added function to import tags from external application (e.g. Tmsu).</li>
  <li>Changed format for rclaptg field. Was colon-separated, now uses normal
    value/attributes syntax with an empty value like: 
    <pre>            localfields = ; attr1 = val1 ; attr2 = val2
    </pre>
  </li>

  <li>Extended file attributes are now indexed by default. As a side effect,
    recoll now uses st_ctime, not st_mtime to detect file changes. This means
    that installing 1.19 will reindex many files (all those that were modified
    since created). Recoll also now processes the <tt>charset</tt> and
    <tt>mime_type</tt> standardized extended attributes.</li>

  <li>The Python module has been expanded to include the interface for
    extracting data. This means that you could now write most of the Recoll GUI
    in Python if you wished. There is a <a
    href="https://opensourceprojects.eu/p/recoll1/code/ci/144da4a5caa2b39d23d9d7cf262f03b6d80a4739/tree/src/python/samples/recollgui/qrecoll.py">bit
    of sample code</a> in the source package doing just this. A few
    incompatible changes had to be made to the Python module. Especially the
    "Query.next" field is gone and the module structure has been changed
    (different import statement needed). Adapting your code is trivial, have a
    look at the changes in the <a
    href="https://bitbucket.org/medoc/recoll/src/5b4bd9ef26a10912bf8bd833fe6c084bd5a7bdbd/src/desktop/unity-lens-recoll/recollscope/rclsearch.py?at=default">Unity
    Lens module</a> for an example. The new module is compatible with the <a
    href="http://www.python.org/dev/peps/pep-0249/">Python Database API
    Specification v2.0</a> for the parts that make sense for a non-relational
    DB.</li>
  <li>Recoll now uses a dynamic library for the code shared by the query
    interface, the indexer and the Python module. This should have no visible
    impact but was rendered necessary by the Python module evolutions.</li>
  <li>And quite a few <a href="BUGS.html#b_1_18_2">Fixed bugs</a></li>
</ul>
</div>
</body>
</html>