git clone https://@opensourceprojects.eu/git/p/recoll1/code recoll1-code
Read Me
* Home
* Screenshots
* Downloads
* Credits
* User manual
* Installation
[IMG]
Recoll
This is Recoll, a personal full text indexing system.
Recoll is free and copyrighted under the GPL license, see COPYING inside
the distribution. A lot of the code is imported from other packages, see
the Credits.
Recoll is still in infancy, but it is based on a very strong backend
(Xapian), and I find it quite useful right now. You might be interested in
using Recoll to index your home directory instead of using xapian's Omega,
for example, if you do not want to run a web server, or your data is not
iso-8859-1. But the query features are much less sophisticated for now.
See INSTALL inside the distribution for compiling and installing, very
much by hand for now, I hope things will get better in the near future.
Features:
* Document types: text, html, pdf (with xpdf's pdftotext), postscript
(with ghostscript's pstotext), msword (with antiword), openoffice
files, maildir and mailbox mail folders (mozilla and thunderbird mail
ok). Deals with compressed versions of same.
* Relatively powerful query facilities, with boolean searches, phrases,
filter on file types and directory tree.
* Support for multiple charsets. Internal processing and storage uses
Unicode UTF-8.
* Stemming performed at query time (can switch stemming language after
indexing)
* Easy installation. No database daemon, web server or exotic language
necessary. The idea is that EVERYBODY should index their files because
it makes life easier.
* An ugly GUI, qt-based, written with qt Designer.
* An indexer which runs either as a thread inside the GUI or as an
external, cron'able program.
recoll has been compiled and tested on FreeBSD, Linux and Solaris
(versions FreeBSD 5.3, red hat 7.3, Solaris 8, but other not too distant
releases should be ok too).
Things lacking, coming in the not too far future:
* An interactive configuration tool. You need to edit files by hand for
now.
* Packages, rpm or other. It's all tar files currently.
* A build system, autoconf et al.
* Documentation and help.
* A few more filters for less common file types.
I very much welcome suggestions or (gasp) code
In hope that this can be useful to somebody, it already is for me.
Credits
Recoll borrows (steals?) heavily from the following projects. I tried to
include the relevant copyright attributions with the code. Any omission is
unintentional and will be fixed as soon as notified.
* Xapian: The database module (core) is used unmodified, and quite a lot
of code has been borrowed from Omega, the web-based search application
(ie: the html parser, plus miscellaneous bits and ideas).
* Estraier: Miscellaneous pieces of code and ideas, especially for
charset handling, and code from external filters.
* Unac: for accent removal. This is a relatively small package, not that
easy to find, it has been integrated almost unmodified in the Recoll
package.
* Iconv, for character set conversion.
* Binc IMAP for MIME parsing code.
jean-francois.dockes@wanadoo.fr