Parent: [cf3e31] (diff)

Child: [38adc5] (diff)

Download this file

filters.html    169 lines (136 with data), 6.7 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Recoll updated filters</title>
<meta name="generator" content="HTML Tidy, see www.w3.org">
<meta name="Author" content="Jean-Francois Dockes">
<meta name="Description" content=
"recoll is a simple full-text search system for unix and linux
based on the powerful and mature xapian engine">
<meta name="Keywords" content=
"full text search, desktop search, unix, linux">
<meta http-equiv="Content-language" content="en">
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta name="robots" content="All,Index,Follow">
<link type="text/css" rel="stylesheet" href="../styles/style.css">
</head>
<body>
<div class="rightlinks">
<ul>
<li><a href="../index.html">Home</a></li>
<li><a href="../download.html">Downloads</a></li>
<li><a href="../usermanual/index.html">User manual</a></li>
<li><a href="../usermanual/rcl.install.html">Installation</a></li>
<li><a href="../index.html#support">Support</a></li>
</ul>
</div>
<div class="content">
<h1>Updated filters for Recoll</h1>
<p>The following describe new and updated filters, which will be
part of the next release, but can be installed on the current
release if you need them.</p>
<p>For updated filters, you just need to copy the script to the
filters directory which may be typically either <span
class="filename">/usr/share/recoll/filters</span>, or <span
class="filename">/usr/local/share/recoll/filters</span>. Please check
that the script is executable after copying it, and make it so if
needed (chmod a+x <i>scriptname</i>)</p>
<p>For new filters, you'll need to copy the script file as
above, possibly install the supporting application, and usually
edit the
<span class="filename">mimemap</span>,
<span class="filename">mimeview</span> and
<span class="filename">mimeconf</span> files, either in the
shared directory
(<span class="filename">
/usr[/local]/share/recoll/examples</span>), or
in your personal configuration directory
(<span class="filename">$HOME/.recoll</span> or
<span class="filename">$RECOLL_CONFDIR</span>).</p>
<p>Alternatively, you can replace your system files with
these updated and complete versions:
<a href="mimemap">mimemap</a>
<a href="mimeconf">mimeconf</a>
<a href="mimeview">mimeview</a> </p>
<blockquote>
<p>There is a new rclepub filter for EPUB ebooks. It is new for
all recoll versions before 1.18.0.</p>
<p>rclchm needs to be updated for all Recoll versions up
to and including 1.17.1.</p>
<p>If you are running an older Recoll version, you really
should upgrade.</p>
</blockquote>
<h2>EPUB documents</h2>
<p>New <a href="rclepub">rclepub</a> filter for EPUB documents.
This needs
the <a href="http://pypi.python.org/pypi/epub/0.5.0">
python epub decoding module</a>. The mimeview/mimemap and
mimeconf files in this directory have the appropriate
entries.</p>
<h2>Updated Open Document filter</h2>
<p>The <a href="rclsoff">new filter</a> will correctly handle
exported Google Docs
documents and also Open/LibreOffice ones in some cases. The
previous filters concatenated all the text inside the exported
Google docs without any spacing...</p>
<h2>TAR archives</h2>
<p>New <a href="rcltar">rcltar</a> filter for tar archives. The
indexing of tar archives is disabled by default in the sample
configuration (stored here). You'll need to add
an <tt>application/x-tar = execm rcltar</tt> line in the
[index] section of your $HOME/mimeconf to enable it.</p>
<h2>XML files</h2>
<p>By default, the current recoll version does not index xml
content (except for known formats like dia, svg etc.). This
new <a href="rclxml">rclxml</a> filter will extract the data
from any xml file. Only text data is extracted, no attribute
values. The other option is to treat xml file as plain text
one (see comment in mimeconf), and index everything, including
a lot of garbage.</p>
<h2>DIA files</h2>
<p><a href="rcldia">rcldia</a> is a new filter
for <a href="http://projects.gnome.org/dia/">Dia</a> files,
contributed by Stefan Friedel.</p>
<h2>CHM files</h2>
<p><a href="rclchm">rclchm</a>. The previous version of the
filter mishandled files which had encoded internal URLs (not
very frequent, but happens).</p>
<h2>Okular annotations</h2>
<p><a href="rclokulnote">rclokulnote</a>. Okular lets you create
annotations for PDF documents and stores them in xml format
somewhere under ~/.kde. This filter does not do a nice job to
format the data, but will at least let you find it...</p>
<h2>Gnumeric</h2>
<p><a href="rclgnm">rclgnm</a>. Needs xsltproc and
gunzip. As <tt>.gnumeric</tt> was in the list of
explicitely ignored suffixes, you can't just add the mime
and indexer script lines to your local mimemap and mimeconf, you
also need to define recoll_noindex in the local mimemap (to
override the system one which
contains <tt>.gnumeric</tt>). The simplest approach may be to
just replace the system files with those above.</p>
<h2>Rar archive support</h2>
<p><a href="rclrar">rclrar</a>. This is up to date in Recoll
1.16.2 but may be added to Recoll 1.15. It needs the Python
rarfile module. </p>
<h2>Mimehtml support</h2>
<p>This is based on the internal mail filter, you just need to
download and install the configuration files (mimemap and
mimeconf. Will only work with 1.15 and later.</p>
<h2>Konqueror webarchive (.war) filter</h2>
<p><a href="rclwar">rclwar</a></p>
<h2>Updated zip archive filter</h2>
<p>The filter is corrected to handle utf-8 paths in zip archives:
<a href="rclzip">rclzip</a>. Up to date in Recoll 1.16, but
may be useful with Recoll 1.15</p>
<h2>Updated audio tag filter</h2>
<p>The mutagen-based rclaudio filter delivered with recoll 1.14.2
used a very recent mutagen interface which will only work with
mutagen versions after 1.17 (probably. at least works with 1.19,
doesn't with 1.15).
You can download the <a href="rclaudio">corrected script
here. Not useful with Recoll 1.5 or 1.6</a>.
</p>
</div>
</body>
</html>