Supporting packages

Note

The Windows installation of Recoll is self-contained, and only needs Python 2.7 to be externally installed. Windows users can skip this section.

Recoll uses external applications to index some file types. You need to install them for the file types that you wish to have indexed (these are run-time optional dependencies. None is needed for building or running Recoll except for indexing their specific file type).

After an indexing pass, the commands that were found missing can be displayed from the recoll File menu. The list is stored in the missing text file inside the configuration directory.

A list of common file types which need external commands follows. Many of the handlers need the iconv command, which is not always listed as a dependancy.

Please note that, due to the relatively dynamic nature of this information, the most up to date version is now kept on http://www.recoll.org/features.html along with links to the home pages or best source/patches pages, and misc tips. The list below is not updated often and may be quite stale.

For many Linux distributions, most of the commands listed can be installed from the package repositories. However, the packages are sometimes outdated, or not the best version for Recoll, so you should take a look at http://www.recoll.org/features.html if a file type is important to you.

As of Recoll release 1.14, a number of XML-based formats that were handled by ad hoc handler code now use the xsltproc command, which usually comes with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.

Now for the list:

  • Openoffice files need unzip and xsltproc.

  • PDF files need pdftotext which is part of Poppler (usually comes with the poppler-utils package). Avoid the original one from Xpdf.

  • Postscript files need pstotext. The original version has an issue with shell character in file names, which is corrected in recent packages. See http://www.recoll.org/features.html for more detail.

  • MS Word needs antiword. It is also useful to have wvWare installed as it may be be used as a fallback for some files which antiword does not handle.

  • MS Excel and PowerPoint are processed by internal Python handlers.

  • MS Open XML (docx) needs xsltproc.

  • Wordperfect files need wpd2html from the libwpd (or libwpd-tools on Ubuntu) package.

  • RTF files need unrtf, which, in its older versions, has much trouble with non-western character sets. Many Linux distributions carry outdated unrtf versions. Check http://www.recoll.org/features.html for details.

  • TeX files need untex or detex. Check http://www.recoll.org/features.html for sources if it's not packaged for your distribution.

  • dvi files need dvips.

  • djvu files need djvutxt and djvused from the DjVuLibre package.

  • Audio files: Recoll releases 1.14 and later use a single Python handler based on mutagen for all audio file types.

  • Pictures: Recoll uses the Exiftool Perl package to extract tag information. Most image file formats are supported. Note that there may not be much interest in indexing the technical tags (image size, aperture, etc.). This is only of interest if you store personal tags or textual descriptions inside the image files.

  • chm: files in Microsoft help format need Python and the pychm module (which needs chmlib).

  • ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar module. icalendar is not needed for newer versions, which use internal code.

  • Zip archives need Python (and the standard zipfile module).

  • Rar archives need Python, the rarfile Python module and the unrar utility.

  • Midi karaoke files need Python and the Midi module

  • Konqueror webarchive format with Python (uses the Tarfile module).

  • Mimehtml web archive format (support based on the email handler, which introduces some mild weirdness, but still usable).

Text, HTML, email folders, and Scribus files are processed internally. Lyx is used to index Lyx files. Many handlers need iconv and the standard sed and awk.