|
a/website/features.html |
|
b/website/features.html |
|
... |
|
... |
80 |
tested on Linux, Darwin and Solaris (initial versions Redhat 7,
|
80 |
tested on Linux, Darwin and Solaris (initial versions Redhat 7,
|
81 |
Fedora Core 5, Suse 10, Gentoo, Debian 3.1, Solaris 8). It
|
81 |
Fedora Core 5, Suse 10, Gentoo, Debian 3.1, Solaris 8). It
|
82 |
should compile and run on all subsequent releases of these
|
82 |
should compile and run on all subsequent releases of these
|
83 |
systems and probably a few others too.</p>
|
83 |
systems and probably a few others too.</p>
|
84 |
|
84 |
|
85 |
<p>Qt versions from 3.1 to 4.7</p>
|
85 |
<p>Qt versions from 3.1 to 4.8</p>
|
86 |
|
86 |
|
87 |
<h2><a name="doctypes">Document types</a></h2>
|
87 |
<h2><a name="doctypes">Document types</a></h2>
|
88 |
|
88 |
|
89 |
<p>Recoll can index many document types (along with their
|
89 |
<p>Recoll can index many document types (along with their
|
90 |
compressed versions). Some types are handled internally (no
|
90 |
compressed versions). Some types are handled internally (no
|
|
... |
|
... |
94 |
are listed in the native section.</p>
|
94 |
are listed in the native section.</p>
|
95 |
|
95 |
|
96 |
<h4>File types indexed natively</h4>
|
96 |
<h4>File types indexed natively</h4>
|
97 |
|
97 |
|
98 |
<ul>
|
98 |
<ul>
|
99 |
<li><span class="literal">text</span>.</li>
|
99 |
<li><span class="application">text</span>.</li>
|
100 |
|
|
|
101 |
<li><span class="literal">html</span>.</li>
|
100 |
<li><span class="application">html</span>.</li>
|
102 |
|
|
|
103 |
<li><span class="literal">maildir</span> and
|
101 |
<li><span class="application">maildir</span> and
|
104 |
<span class="literal">mailbox</span> (
|
102 |
<span class="application">mailbox</span> (
|
105 |
<span class="literal">Mozilla</span>,
|
103 |
<span class="application">Mozilla</span>,
|
106 |
<span class="literal">Thunderbird</span> and
|
104 |
<span class="application">Thunderbird</span> and
|
107 |
<span class="literal">Evolution</span>mail ok).
|
105 |
<span class="application">Evolution</span> mail ok).
|
108 |
</li>
|
106 |
</li>
|
109 |
|
107 |
|
110 |
<li><span class="literal">gaim</span> and
|
108 |
<li><span class="application">gaim</span> and
|
111 |
<span class="literal">purple</span> log files.</li>
|
109 |
<span class="application">purple</span> log files.</li>
|
112 |
|
110 |
|
113 |
<li><span class="literal">Lyx</span> files (needs
|
|
|
114 |
<span class="literal">Lyx</span> to be installed).</li>
|
|
|
115 |
|
|
|
116 |
<li><span class="literal">Scribus</span> files.</li>
|
111 |
<li><span class="application">Scribus</span> files.</li>
|
117 |
|
112 |
|
118 |
<li><span class="literal">Man pages</span> (needs
|
113 |
<li><span class="application">Man pages</span> (needs
|
119 |
<span class="command">groff</span>).</li>
|
114 |
<span class="application">groff</span>).</li>
|
120 |
|
115 |
|
121 |
<li><span class="literal">Dia</span> diagrams.</li>
|
116 |
<li><span class="application">Dia</span> diagrams.</li>
|
122 |
</ul>
|
117 |
</ul>
|
123 |
|
118 |
|
124 |
<h4>File types indexed with external helpers</h4>
|
119 |
<h4>File types indexed with external helpers</h4>
|
125 |
|
120 |
|
126 |
<p>Many document types need the <span class="command">iconv</span>
|
121 |
<p>Many document types need the <span class="command">iconv</span>
|
|
... |
|
... |
131 |
<p>The following types need <span class="command">
|
126 |
<p>The following types need <span class="command">
|
132 |
xsltproc</span> from the <b>libxslt</b> package.
|
127 |
xsltproc</span> from the <b>libxslt</b> package.
|
133 |
Quite a few also need <span class="command">unzip</span>:</p>
|
128 |
Quite a few also need <span class="command">unzip</span>:</p>
|
134 |
|
129 |
|
135 |
<ul>
|
130 |
<ul>
|
136 |
<li><span class="literal">Abiword</span> files.</li>
|
131 |
<li><span class="application">Abiword</span> files.</li>
|
137 |
|
132 |
|
138 |
<li><span class="literal">Fb2</span> ebooks.</li>
|
133 |
<li><span class="application">Fb2</span> ebooks.</li>
|
139 |
|
134 |
|
140 |
<li><span class="literal">Kword</span> files.</li>
|
135 |
<li><span class="application">Kword</span> files.</li>
|
141 |
|
136 |
|
142 |
<li><span class="literal">Microsoft Office Open XML</span>
|
137 |
<li><span class="application">Microsoft Office Open XML</span>
|
143 |
files.</li>
|
138 |
files.</li>
|
144 |
|
139 |
|
145 |
<li><span class="literal">OpenOffice</span> files.</li>
|
140 |
<li><span class="application">OpenOffice</span> files.</li>
|
146 |
|
141 |
|
147 |
<li><span class="literal">SVG</span> files.</li>
|
142 |
<li><span class="application">SVG</span> files.</li>
|
148 |
<li><span class="literal">Gnumeric</span> files.</li>
|
143 |
<li><span class="application">Gnumeric</span> files.</li>
|
149 |
<li><span class="literal">Okular</span> annotations files.</li>
|
144 |
<li><span class="application">Okular</span> annotations files.</li>
|
150 |
|
145 |
|
151 |
</ul>
|
146 |
</ul>
|
152 |
|
147 |
|
153 |
<h5>Other formats</h5>
|
148 |
<h5>Other formats</h5>
|
154 |
|
149 |
|
155 |
<p>The following need miscellaneous helper programs to decode
|
150 |
<p>The following need miscellaneous helper programs to decode
|
156 |
the internal formats.</p>
|
151 |
the internal formats.</p>
|
157 |
|
152 |
|
158 |
<ul>
|
153 |
<ul>
|
159 |
<li><span class="literal">pdf</span> with the <span class=
|
154 |
<li><span class="application">pdf</span> with the <span class=
|
160 |
"command">pdftotext</span> command, which can be installed
|
155 |
"command">pdftotext</span> command, which can be installed
|
161 |
as part of <a href="http://www.foolabs.com/xpdf/">xpdf</a>
|
156 |
as part of <a href="http://www.foolabs.com/xpdf/">xpdf</a>
|
162 |
or <a href="http://poppler.freedesktop.org/">poppler</a>,
|
157 |
or <a href="http://poppler.freedesktop.org/">poppler</a>,
|
163 |
depending on your distribution.</li>
|
158 |
depending on your distribution.</li>
|
164 |
|
159 |
|
165 |
<li><span class="literal">msword</span> with <a href=
|
160 |
<li><span class="application">msword</span> with <a href=
|
166 |
"http://www.winfield.demon.nl/">antiword</a>. It is also useful to
|
161 |
"http://www.winfield.demon.nl/">antiword</a>. It is also useful to
|
167 |
have <a href="http://wvware.sourceforge.net/">wvWare</a> installed
|
162 |
have <a href="http://wvware.sourceforge.net/">wvWare</a> installed
|
168 |
as it may be be used as a fallback for some files which antiword
|
163 |
as it may be be used as a fallback for some files which antiword
|
169 |
does not handle.</li>
|
164 |
does not handle.</li>
|
170 |
|
165 |
|
|
|
166 |
<li><span class="application">Wordperfect</span> with the
|
|
|
167 |
<span class="command">wpd2html</span> command from <a href=
|
|
|
168 |
"http://libwpd.sourceforge.net">libwpd</a>. On some distributions,
|
|
|
169 |
the command may come with a package named <span
|
|
|
170 |
class="literal">libwpd-tools</span> or such, not the base <a
|
|
|
171 |
span="literal">libwpd</a> package.</li>
|
|
|
172 |
|
|
|
173 |
|
|
|
174 |
<li><span class="application">Lyx</span> files (needs
|
|
|
175 |
<span class="application">Lyx</span> to be installed).</li>
|
|
|
176 |
|
171 |
<li><span class="literal">Powerpoint</span> and <span
|
177 |
<li><span class="application">Powerpoint</span> and <span
|
172 |
class="literal">Excel</span> with the <a href=
|
178 |
class="application">Excel</span> with the <a href=
|
173 |
"http://vitus.wagner.pp.ru/software/catdoc/">catdoc</a> utilities.</li>
|
179 |
"http://vitus.wagner.pp.ru/software/catdoc/">catdoc</a> utilities.</li>
|
174 |
|
180 |
|
175 |
<li><span class="literal">CHM (Microsoft help)</span> files
|
181 |
<li><span class="application">CHM (Microsoft help)</span> files
|
176 |
with <span class="command">Python,
|
182 |
with <span class="command">Python,
|
177 |
<a href="http://gnochm.sourceforge.net/pychm.html">pychm</a>
|
183 |
<a href="http://gnochm.sourceforge.net/pychm.html">pychm</a>
|
178 |
and <a href="http://www.jedrea.com/chmlib/">chmlib</a></span>.</li>
|
184 |
and <a href="http://www.jedrea.com/chmlib/">chmlib</a></span>.</li>
|
179 |
|
185 |
|
180 |
<li><span class="literal">GNU info</span> files
|
186 |
<li><span class="application">GNU info</span> files
|
181 |
with <span class="command">Python</span> and the
|
187 |
with <span class="command">Python</span> and the
|
182 |
<span class="command">info</span> command.</li>
|
188 |
<span class="command">info</span> command.</li>
|
183 |
|
189 |
|
|
|
190 |
<li><span class="application">Tar</span> archives (needs <span
|
|
|
191 |
class="command">Python</span>). Tar file indexing is disabled
|
|
|
192 |
by default (because tar archives don't typically contain the
|
|
|
193 |
kind of documents that people search for), you will need to
|
|
|
194 |
enable it explicitely, like with the following in your
|
|
|
195 |
<span class="filename">$HOME/.recoll/mimeconf</span> file:
|
|
|
196 |
<pre>
|
|
|
197 |
[index]
|
|
|
198 |
application/x-tar = execm rcltar
|
|
|
199 |
</pre>
|
|
|
200 |
</li>
|
|
|
201 |
|
184 |
<li><span class="literal">Zip</span> archives (needs <span
|
202 |
<li><span class="application">Zip</span> archives (needs <span
|
185 |
class="command">Python</span>).</li>
|
203 |
class="command">Python</span>).</li>
|
186 |
|
204 |
|
187 |
<li><span class="literal">Rar</span> archives (needs <span
|
205 |
<li><span class="application">Rar</span> archives (needs <span
|
188 |
class="command">Python</span>), the
|
206 |
class="command">Python</span>), the
|
189 |
<a href="http://pypi.python.org/pypi/rarfile/">rarfile</a> Python
|
207 |
<a href="http://pypi.python.org/pypi/rarfile/">rarfile</a> Python
|
190 |
module and the <a
|
208 |
module and the <a
|
191 |
href="http://www.rarlab.com/rar_add.htm">unrar</a> utility.</li>
|
209 |
href="http://www.rarlab.com/rar_add.htm">unrar</a> utility.</li>
|
192 |
|
210 |
|
193 |
<li><span class="literal">iCalendar</span>(.ics) files
|
211 |
<li><span class="application">iCalendar</span>(.ics) files
|
194 |
(needs <span class="command">Python, <a href=
|
212 |
(needs <span class="command">Python, <a href=
|
195 |
"http://pypi.python.org/pypi/icalendar/2.1">icalendar</a></span>).</li>
|
213 |
"http://pypi.python.org/pypi/icalendar/2.1">icalendar</a></span>).</li>
|
196 |
|
214 |
|
197 |
<li><span class="literal">Mozilla calendar data</span> See
|
215 |
<li><span class="application">Mozilla calendar data</span> See
|
198 |
<a href=
|
216 |
<a href=
|
199 |
"http://bitbucket.org/medoc/recoll/wiki/IndexMozillaCalendari">
|
217 |
"http://bitbucket.org/medoc/recoll/wiki/IndexMozillaCalendari">
|
200 |
the wiki</a> about this.</li>
|
218 |
the wiki</a> about this.</li>
|
201 |
|
219 |
|
202 |
<li><span class="literal">Wordperfect</span> with the
|
|
|
203 |
<span class="command">wpd2html</span> command from <a href=
|
|
|
204 |
"http://libwpd.sourceforge.net">libwpd</a>. On some distributions,
|
|
|
205 |
the command may come with an package named <span
|
|
|
206 |
class="literal">libwpd-tools</span> or such, not the base <a
|
|
|
207 |
span="literal">libwpd</a> package.</li>
|
|
|
208 |
|
|
|
209 |
<li><span class="literal">postscript</span> with <a href=
|
220 |
<li><span class="application">postscript</span> with <a href=
|
210 |
"http://www.gnu.org/software/ghostscript/ghostscript.html">
|
221 |
"http://www.gnu.org/software/ghostscript/ghostscript.html">
|
211 |
ghostscript</a> and <a href=
|
222 |
ghostscript</a> and <a href=
|
212 |
"http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">pstotext</a>.
|
223 |
"http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">pstotext</a>.
|
213 |
Pstotext 1.9 has a serious issue with special characters in
|
224 |
Pstotext 1.9 has a serious issue with special characters in
|
214 |
file names, and you should either use the version packaged for
|
225 |
file names, and you should either use the version packaged for
|
|
... |
|
... |
233 |
</pre>
|
244 |
</pre>
|
234 |
</blockquote>
|
245 |
</blockquote>
|
235 |
</li>
|
246 |
</li>
|
236 |
|
247 |
|
237 |
|
248 |
|
238 |
<li><span class="literal">RTF</span> files with <a href=
|
249 |
<li><span class="application">RTF</span> files with <a href=
|
239 |
"http://www.gnu.org/software/unrtf/unrtf.html">unrtf</a>. Please
|
250 |
"http://www.gnu.org/software/unrtf/unrtf.html">unrtf</a>. Please
|
240 |
note that up to version
|
251 |
note that up to version
|
241 |
0.21, <span class="command">unrtf</span> mostly does not work
|
252 |
0.21, <span class="command">unrtf</span> mostly does not work
|
242 |
with non western-european character sets. If you have a need
|
253 |
with non western-european character sets. If you have a need
|
243 |
for indexing, ie, russian or chinese RTF files, I have
|
254 |
for indexing, ie, russian or chinese RTF files, I have
|
|
... |
|
... |
246 |
download the <a href="unrtf/unrtf-0.22.2beta.tar.gz">source
|
257 |
download the <a href="unrtf/unrtf-0.22.2beta.tar.gz">source
|
247 |
here</a>. The development is hosted
|
258 |
here</a>. The development is hosted
|
248 |
on <a href="http://www.bitbucket.org/medoc/unrtf-int">
|
259 |
on <a href="http://www.bitbucket.org/medoc/unrtf-int">
|
249 |
bitbucket.org</a>.</li>
|
260 |
bitbucket.org</a>.</li>
|
250 |
|
261 |
|
251 |
<li><span class="literal">TeX</span> with <span class=
|
262 |
<li><span class="application">TeX</span> with <span class=
|
252 |
"command">untex</span>. If there is no untex package for
|
263 |
"command">untex</span>. If there is no untex package for
|
253 |
your distribution, <a href="untex/untex-1.3.jf.tar.gz">a
|
264 |
your distribution, <a href="untex/untex-1.3.jf.tar.gz">a
|
254 |
source package is stored on this site</a> (as untex has no
|
265 |
source package is stored on this site</a> (as untex has no
|
255 |
obvious home). Will also work with <a href=
|
266 |
obvious home). Will also work with <a href=
|
256 |
"http://www.cs.purdue.edu/homes/trinkle/detex/">detex</a>
|
267 |
"http://www.cs.purdue.edu/homes/trinkle/detex/">detex</a>
|
257 |
if this is installed.</li>
|
268 |
if this is installed.</li>
|
258 |
|
269 |
|
259 |
<li><span class="literal">dvi</span> with <a href=
|
270 |
<li><span class="application">dvi</span> with <a href=
|
260 |
"http://www.radicaleye.com/dvips.html">dvips</a>.</li>
|
271 |
"http://www.radicaleye.com/dvips.html">dvips</a>.</li>
|
261 |
|
272 |
|
262 |
<li><span class="literal">djvu</span> with <a href=
|
273 |
<li><span class="application">djvu</span> with <a href=
|
263 |
"http://djvu.sourceforge.net">DjVuLibre</a>.</li>
|
274 |
"http://djvu.sourceforge.net">DjVuLibre</a>.</li>
|
264 |
|
275 |
|
265 |
<li>Audio file tags: Recoll releases 1.13 and older use <a
|
276 |
<li><span class="application">Audio file tags</span>.
|
266 |
href="http://id3lib.sourceforge.net/">id3info (id3lib)</a>
|
|
|
267 |
(compiling id3lib on recent systems may need a small patch,
|
|
|
268 |
see <a href="id3lib.html">here.</a>) or the ogg and flac
|
|
|
269 |
tools.<br>
|
|
|
270 |
Recoll releases 1.14 and later use a Python filter based
|
277 |
Recoll releases 1.14 and later use a Python filter based
|
271 |
on <a href="http://code.google.com/p/mutagen/">mutagen</a>
|
278 |
on <a href="http://code.google.com/p/mutagen/">mutagen</a>
|
272 |
for all audio types.</li>
|
279 |
for all audio types.</li>
|
273 |
|
280 |
|
274 |
<li>Image file tags with <a href=
|
281 |
<li><span class="application">Image file tags</span> with <a href=
|
275 |
"http://www.sno.phy.queensu.ca/~phil/exiftool/">exiftool</a>.
|
282 |
"http://www.sno.phy.queensu.ca/~phil/exiftool/">exiftool</a>.
|
276 |
This is a perl program, so you also need perl on the
|
283 |
This is a perl program, so you also need perl on the
|
277 |
system. This works with about any possible image file and
|
284 |
system. This works with about any possible image file and
|
278 |
tag format (jpg, png, tiff, gif etc.).</li>
|
285 |
tag format (jpg, png, tiff, gif etc.).</li>
|
279 |
|
286 |
|
280 |
<li>Midi karaoke files with Python, the
|
287 |
<li><span class="application">Midi karaoke files</span> with
|
|
|
288 |
Python, the
|
281 |
<a href="http://pypi.python.org/pypi/midi/0.2.1">
|
289 |
<a href="http://pypi.python.org/pypi/midi/0.2.1">
|
282 |
midi module</a>, and some help
|
290 |
midi module</a>, and some help
|
283 |
from <a href="http://chardet.feedparser.org/">chardet</a>. There
|
291 |
from <a href="http://chardet.feedparser.org/">chardet</a>. There
|
284 |
is probably a <tt>chardet</tt> package for your distribution,
|
292 |
is probably a <tt>chardet</tt> package for your distribution,
|
285 |
but you will quite probably need to build the midi
|
293 |
but you will quite probably need to build the midi
|
286 |
package. This is easy but see the
|
294 |
package. This is easy but see the
|
287 |
to <a href="helpernotes.html#midi">notes here</a>.
|
295 |
to <a href="helpernotes.html#midi">notes here</a>.
|
288 |
</li>
|
296 |
</li>
|
289 |
|
297 |
|
290 |
<li>Konqueror webarchive format with Python (uses the tarfile
|
298 |
<li><span class="application">Konqueror webarchive</span>
|
291 |
module).</li>
|
299 |
format with Python (uses the tarfile module).</li>
|
292 |
|
300 |
|
293 |
<li>mimehtml web archive format (support based on the mail
|
301 |
<li><span class="application">Mimehtml web archive
|
|
|
302 |
format</span> (support based on the mail
|
294 |
filter, which introduces some mild weirdness, but still
|
303 |
filter, which introduces some mild weirdness, but still
|
295 |
usable).</li>
|
304 |
usable).</li>
|
296 |
|
305 |
|
297 |
</ul>
|
306 |
</ul>
|
298 |
|
307 |
|
299 |
<h2><a name="other">Other features</a></h2>
|
308 |
<h2><a name="other">Other features</a></h2>
|
300 |
|
309 |
|
301 |
<ul>
|
310 |
<ul>
|
302 |
<li>Can use <b>Beagle</b> browser plug-ins to index web
|
311 |
<li>Can use <b>Beagle</b> browser plug-ins to index web
|
303 |
history. See the <a href=
|
312 |
history. See <a href=
|
304 |
"http://bitbucket.org/medoc/recoll/wiki/IndexBeagleWeb">the
|
313 |
"http://bitbucket.org/medoc/recoll/wiki/IndexBeagleWeb">the
|
305 |
Wiki</a> for more detail.</li>
|
314 |
Wiki</a> for more detail.</li>
|
306 |
|
315 |
|
307 |
<li>Processes all email attachments, and more generally any
|
316 |
<li>Processes all email attachments, and more generally any
|
308 |
realistic level of container imbrication (the "msword attachment to
|
317 |
realistic level of container imbrication (the "msword attachment to
|