Parent: [50d526] (diff)

Child: [115d03] (diff)

Download this file

README    197 lines (152 with data), 7.8 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196

A more complete version of this document can be found at http://www.recoll.org


     * Home
     * Screenshots
     * Credits
     * Downloads
     * Installation
     * User manual
   [IMG]

Recoll

   Recoll is a personal full text search package for Linux, FreeBSD and other
   Unix systems.

   Recoll is based on a very strong backend (Xapian), for which it provides
   an easy to use, feature-rich, easy administration interface.

   Recoll is free and copyrighted under the GPL license, see COPYING inside
   the distribution. A lot of the code is imported from other packages, see
   the Credits.

  Features:

     * QT-based GUI.
     * Supports the following document types:
          * text.
          * html.
          * OpenOffice files.
          * maildir and mailbox mail folders (Mozilla and Thunderbird mail
            ok).
          * pdf (with pdftotext).
          * postscript (with ghostscript's pstotext).
          * msword (with antiword).
          * rtf text (with unrtf).
          * gaim log files.
       along with their compressed versions.
     * Powerful query facilities, with boolean searches, phrases, filter on
       file types and directory tree.
     * Support for multiple charsets. Internal processing and storage uses
       Unicode UTF-8.
     * Stemming performed at query time (can switch stemming language after
       indexing)
     * Easy installation. No database daemon, web server or exotic language
       necessary.
     * An indexer which runs either as a thread inside the GUI or as an
       external, cron'able program.

   Recoll has been compiled and tested on FreeBSD, Linux, Darwin and Solaris
   (versions FreeBSD 5.3, Redhat 7.3, Solaris 8, but other not too distant
   releases should be ok too). You can download the source code here.

  Future evolutions

   Things hopefully coming in the not too far future (especially with some
   help):

     * Support for the more advanced Xapian concepts like relevance feedback.
     * An interactive configuration tool.
     * Rpms or other kinds of packages.
     * A more polished user interface with online help and better
       documentation.
     * More translations for the user interface.
     * A few more filters for less common file types.
     * Integration with the KDE desktop.

   I very much welcome suggestions or (gasp) code.

   In hope that this can be useful to somebody, it already is for me.
     * Home
     * Screenshots
     * Credits
     * Downloads
     * Installation
     * User manual

  Credits

   Recoll borrows (steals?) heavily from the following projects. I tried to
   include the relevant copyright attributions with the code. Any omission is
   unintentional and will be fixed as soon as notified.

     * Xapian: The database module (core) is used unmodified, and quite a lot
       of code has been borrowed from Omega, the web-based search application
       (ie: the html parser, plus miscellaneous bits and ideas).
     * Estraier: Miscellaneous pieces of code and ideas, especially for
       charset handling, and code from external filters.
     * Unac: for accent removal. This is a relatively small package, not that
       easy to find, it has been integrated almost unmodified in the Recoll
       package.
     * Iconv, for character set conversion.
     * Binc IMAP for MIME parsing code.
     * I fear that bugs found elsewhere are mostly mine:
       jean-francois.dockes@wanadoo.fr
     * Home
     * Screenshots
     * Credits
     * Downloads
     * Installation
     * User manual

Introduction: full text search.

   A full text search program will let you search for data by specifying the
   terms that you think appear in the content you are looking for.

   You do not need to remember in what file or email message you stored a
   given piece of information. You just ask for related terms, and the tool
   will return a list of documents where those terms are prominent.

   In addition, the tool will automatically expand your search to terms
   related to the ones you specified. Ie: a search for floor will also look
   for floors, flooring etc. With Recoll you can disable this expansion when
   entering the query.

   Recoll, like most such search tools, works by remembering where terms
   appear in your document files. The acquisition process is called
   indexation. The resulting database can be big, in practise, roughly the
   size of the original document set.

   Recoll is not a document archive. It can only display data from files that
   still exist where they lived when they were indexed.

Using Recoll

  Indexation

   By default, Recoll will index your home directory. If you want to change
   this, you need to edit the configuration file ($HOME/.recoll/recoll.conf
   or $RECOLL_CONFDIR/recoll.conf if RECOLL_CONFDIR is set). Follow the
   comments in the file to adjust the parameters.

   Indexation is performed either by starting the recollindex program, or the
   indexing thread inside the recoll program (use the File menu).

   It is best to avoid interrupting the indexation process, as this may
   sometimes leave the database in a bad state. This is not a serious
   problem, as you then just need to clear everything and restart the
   indexation. The database files are normally stored in the
   $HOME/.recoll/xapiandb directory, which you can just delete when needed.
   Alternatively, you can start recollindex -z, which will reset the database
   before indexing.

  Simple search

   Start the recoll program, then enter search term(s) in the text field at
   the top left of the window. Clicking the Search button or hitting the
   Enter key will start a search. By default, this will look for documents
   with any of the terms (the ones with more terms will get better scores).
   Use the Tools / Advanced search dialog for other kinds of searches

   A list of results will be displayed in the main list window. Clicking on
   an entry will open an internal preview window for the document.
   Double-clicking will attempt to start an external viewer (have a look at
   the ~/.recoll/mimeconf file to see how these are configured).

   Documents that you actually view (with the internal preview or an external
   tool) are entered into the document history, which is remembered. You can
   display the history list by using the Tools / Doc History menu entry.

   By default, the document list is presented in order of relevance (how well
   the system estimates that the document matches the query). You can specify
   a different ordering by using the Tools / Sort parameters dialog.

  Search tips, shortcuts

   Entering a capitalized word in any search field will prevent stem
   expansion (example: Recoll will not look for gardening if you enter Garden
   instead of garden). This is the only case where character case will make a
   difference for a Recoll search.

   A phrase can be looked for by enclosing it in double quotes. Example:
   "user manual" will look only for occurrences of user immediately followed
   by manual.

   Entering ^Q almost anywhere will close the application.

   Entering ^W in a preview tab will close it (and, for the last tab, close
   the preview window).

  Complex/advanced search

   The advanced search dialog has fields that will allow a more refined
   search, looking for documents with all given words, a given exact phrase,
   or none of the given words (all fields may be combined by an implicit AND
   clause).

   It will let you search for documents of specific mime types (ie: only
   text/plain, or text/html or application/pdf etc...)

   It will let you restrict the search results to a subtree of the indexed
   area.

   In other respects, it works like the simple search.