Download this file

recoll-webui-install-wsgi.txt    281 lines (184 with data), 9.0 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
= Recoll WebUI Apache and nginx installation from scratch
NOTE: thanks to Michael L. Wilson for the `nginx` part.
The https://github.com/koniu/recoll-webui[Recoll WebUI] offers an
alternative, WEB-based, interface for querying a Recoll index.
It can be quite useful to extend the use of a shared index to multiple
workstations, without the need for a local Recoll installation and shared
data storage.
The Recoll WebUI is based on the
http://bottlepy.org/docs/dev/index.html[Bottle Python framework], which has
a built-in WEB server, and the simplest deployment approach is to run it
standalone. However the built-in server is restricted to handling one
request at a time, which is problematic in multi-user situations,
especially because some requests, like extracting a result list into a CSV
file, can take a significant amount of time.
The Bottle framework can work with several multi-threading Python HTTP
server libraries, but, given the limitations of the Recoll Python module
and the Python interpreter itself, this will not yield optimal performance,
and, especially can't efficiently leverage the now ubiquitous
multiprocessors.
In multi-user situations, you can get better performance and ease of use
from the Recoll WebUI by running it under Apache or Nginx rather than as a
standalone process. With this approach, a few requests per second can
easily be handled even in the presence of long-running ones.
Neither Recoll nor the WebUI are optimized for high multi-user load, and it
would be very unwise to use them as the search interface to a busy WEB
site.
The instructions about using the WebUI under Apache as given in the
repository README are a bit terse, and are missing a few details,
especially ones which impact performance.
Here follows the synopsis of three WebUI installations on initially
Apache-less Ubuntu (14.04) and DragonFly BSD systems, and for
Nginx/BSD. The first should extend easily to other Debian-based systems,
the second at least to FreeBSD. rpm-based systems are left as an exercise
to the reader, at least for now...
CAUTION: THE CONFIGURATIONS DESCRIBED HAVE NO ACCESS CONTROL. ANYONE WITH
ACCESS TO THE NETWORK WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY
DOCUMENT.
link:#nginx[Jump to the nginx section].
[[apache]]
== Apache
=== On a Debian/Ubuntu system
==== Install recoll
sudo apt-get install recoll python-recoll
Configure the indexing and check that the normal search works (I spent
quite a lot of time trying to understand why the WebUI did not work, when
in fact it was the normal recoll configuration which was broken and the
regular search did not work either).
Take care to be logged in as the user you want to run the web search as
while you do this.
==== Install the WebUI
Clone the github repository, or extract the master tar installation, and
move it to '/var/www/recoll-webui-master/'. Take care that it is read/execute
accessible by your user.
==== Install Apache and mod-wsgi
sudo apt-get install apache2 libapache2-mod-wsgi
I then got the following message:
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
To clear it, I added a ServerName directive to the Apache config, maybe you
won't need it. Edit '/etc/apache2/sites-available/000-default.conf' and add
the following at the top (globally). Things work without this fix anyway,
this is just to suppress the error message. You probably need to adjust the
address or use a real host name:
ServerName 192.168.4.6
Edit '/etc/apache2/mods-enabled/wsgi.conf', add the following at the end of
the "IfModule" section.
Change the user ('dockes' in the example) taking care that he is the one who
owns the index ('.recoll' is in his home directory).
WSGIDaemonProcess recoll user=dockes group=dockes \
threads=1 processes=5 display-name=%{GROUP} \
python-path=/var/www/recoll-webui-master
WSGIScriptAlias /recoll /var/www/recoll-webui-master/webui-wsgi.py
<Directory /var/www/recoll-webui-master>
WSGIProcessGroup recoll
Order allow,deny
allow from all
</Directory>
NOTE: the Recoll WebUI application is mostly single-threaded, so it is of
little use (and may actually be counter-productive in some cases) to
specify multiple threads on the WSGIDaemonProcess line. Specify multiple
processes instead to put multiple CPUs to work on simultaneous requests.
Then run the following to restart Apache:
sudo apachectl restart
The Recoll WebUI should now be accessible. on 'http://my.server.com/recoll/'
NOTE: Take care that you need a '/' at the end of the URL used to access
the search (use: 'http://my.server.com/recoll/', not
'http://my.server.com/recoll'), else files other than the script itself are
not found (the page looks weird and the search does not work).
CAUTION: THERE IS NO ACCESS CONTROL. ANYONE WITH ACCESS TO THE NETWORK
WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY DOCUMENT.
=== Apache Variant for BSD/ports
==== Packages
As root:
pkg install recoll
Do what you need to do to configure the indexing and check that the normal
search works.
Take care to be logged in as the user you want to run the web search as
while you do this.
pkg install apache24
Add apache24_enable="YES" in /etc/rc.conf
pkg install ap24-mod_wsgi4
pkg install git
==== Clone the webui repository
cd /usr/local/www/apache24/
git clone https://github.com/koniu/recoll-webui.git recoll-webui-master
Important: most input handler helper applications (e.g. 'pdftotext') are
installed in '/usr/local/bin' which is not in the PATH as seen by Apache
(at least on DragonFly). The simplest way to fix this is to modify the
launcher module for the webui app so that it fixes the PATH.
Edit 'recoll-webui-master/webui-wsgi.py' and add the following line after
the 'import os' line:
os.environ['PATH'] = os.environ['PATH'] + ':' + '/usr/local/bin'
==== Configure Apache
Edit /usr/local/etc/apache24/modules.d/270_mod_wsgi.conf
Uncomment the LoadModule line, and add the directives to alias /recoll/ to
the webui script.
Change the user (dockes in the example) taking care that he is the one who
owns the index (.recoll is in his home directory).
Contents of the file:
## $FreeBSD$
## vim: set filetype=apache:
##
## module file for mod_wsgi
##
## PROVIDE: mod_wsgi
## REQUIRE:
LoadModule wsgi_module libexec/apache24/mod_wsgi.so
WSGIDaemonProcess recoll user=dockes group=dockes \
threads=1 processes=5 display-name=%{GROUP} \
python-path=/usr/local/www/apache24/recoll-webui-master/
WSGIScriptAlias /recoll /usr/local/www/apache24/recoll-webui-master/webui-wsgi.py
<Directory /usr/local/www/apache24/recoll-webui-master>
WSGIProcessGroup recoll
Require all granted
</Directory>
==== Restart Apache
As root:
apachectl restart
[[nginx]]
== Nginx
=== Nginx for BSD/ports
As root:
pkg install recoll
Do what you need to do to configure the indexing and check that the normal
search works. Take care to be logged in as the user you want to run the web
search as while you do this.
Install required packages:
pkg install nginx uwsgi git
=== Nginx: clone the webui repository
rm /usr/local/www/nginx
mkdir /usr/local/www/nginx
cd /usr/local/www/nginx
git clone https://github.com/koniu/recoll-webui.git recoll-webui-master
Important: most input handler helper applications (e.g. 'pdftotext') are
installed in '/usr/local/bin' which is not in the PATH as seen by Nginx
(at least on DragonFly). The simplest way to fix this is to modify the
launcher module for the webui app so that it fixes the PATH.
Edit 'recoll-webui-master/webui-wsgi.py' and add the following line after
the 'import os' line:
os.environ['PATH'] = os.environ['PATH'] + ':' + '/usr/local/bin'
Also change the following to find the correct path:
#os.chdir(os.path.dirname(__file__))
os.chdir('/usr/local/www/nginx/recoll-webui-master')
=== Nginx: configure uWSGI
Assuming the user running the search is "dockes" (change it to your user),
sysrc uwsgi_uid=$(id -u dockes)
sysrc uwsgi_gid=$(id -g dockes)
sysrc uwsgi_flags="-M -L --wsgi-file /usr/local/www/nginx/recoll-webui-master/webui-wsgi.py"
(ALTERNATIVELY)
Add the following to rc.conf
uwsgi_uid="dockes"
uwsgi_gid="dockes"
uwsgi_flags="-M -L --wsgi-file /usr/local/www/nginx/recoll-webui-master/webui-wsgi.py"
=== Configure nginx
Edit /usr/local/etc/nginx/nginx.conf and set up a proxy to uwsgi service:
location / {
include uwsgi_params;
uwsgi_pass unix:///tmp/uwsgi.sock;
}
=== Enable and start both services
As root:
sysrc uwsgi_enable=YES #Or uwsgi_enable="YES" (in rc.conf)
sysrc nginx_enable=YES #Or nginx_enable="YES" (in rc.conf)
service uwsgi start
service nginx start