Download this file

recoll-webui-install-wsgi.txt    201 lines (133 with data), 6.9 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
= Recoll WebUI Apache installation from scratch
The https://github.com/koniu/recoll-webui[Recoll WebUI] offers an
alternative, WEB-based, interface for querying a Recoll index.
It can be quite useful to extend the use of a shared index to multiple
workstations, without the need for a local Recoll installation and shared
data storage.
The Recoll WebUI is based on the
http://bottlepy.org/docs/dev/index.html[Bottle Python framework], which has
a built-in WEB server, and the simplest deployment approach is to run it
standalone. However the built-in server is restricted to handling one
request at a time, which is problematic in multi-user situations,
especially because some requests, like extracting a result list into a CSV
file, can take a significant amount of time.
The Bottle framework can work with several multi-threading Python HTTP
server libraries, but, given the limitations of the Recoll Python module
and the Python interpreter itself, this will not yield optimal performance,
and, especially can't efficiently leverage the now ubiquitous
multiprocessors.
In multi-user situations, you can get better performance and ease of use
from the Recoll WebUI by running it under Apache rather than as a
standalone process. With this approach, a few requests per second can
easily be handled even in the presence of long-running ones.
Neither Recoll nor the WebUI are optimized for high multi-user load, and it
would be very unwise to use them as the search interface to a busy WEB
site.
The instructions about using the WebUI under Apache as given in the
repository README are a bit terse, and are missing a few details,
especially ones which impact performance.
Here follows the synopsis of two WebUI installations on initially
Apache-less Ubuntu (14.04) and DragonFly BSD systems. The first should
extend easily to other Debian-based systems, the second at least to
FreeBSD. rpm-based systems are left as an exercise to the reader, at least
for now...
CAUTION: THE CONFIGURATIONS DESCRIBED HAVE NO ACCESS CONTROL. ANYONE WITH
ACCESS TO THE NETWORK WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY
DOCUMENT.
== On a Debian/Ubuntu system
=== Install recoll
sudo apt-get install recoll python-recoll
Configure the indexing and check that the normal search works (I spent
quite a lot of time trying to understand why the WebUI did not work, when
in fact it was the normal recoll configuration which was broken and the
regular search did not work either).
Take care to be logged in as the user you want to run the web search as
while you do this.
=== Install the WebUI
Clone the github repository, or extract the master tar installation, and
move it to '/var/www/recoll-webui-master/'. Take care that it is read/execute
accessible by your user.
=== Install Apache and mod-wsgi
sudo apt-get install apache2 libapache2-mod-wsgi
I then got the following message:
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
To clear it, I added a ServerName directive to the apache config, maybe you
won't need it. Edit '/etc/apache2/sites-available/000-default.conf' and add
the following at the top (globally). Things work without this fix anyway,
this is just to suppress the error message. You probably need to adjust the
address or use a real host name:
ServerName 192.168.4.6
Edit '/etc/apache2/mods-enabled/wsgi.conf', add the following at the end of
the "IfModule" section.
Change the user ('dockes' in the example) taking care that he is the one who
owns the index ('.recoll' is in his home directory).
WSGIDaemonProcess recoll user=dockes group=dockes \
threads=1 processes=5 display-name=%{GROUP} \
python-path=/var/www/recoll-webui-master
WSGIScriptAlias /recoll /var/www/recoll-webui-master/webui-wsgi.py
<Directory /var/www/recoll-webui-master>
WSGIProcessGroup recoll
Order allow,deny
allow from all
</Directory>
NOTE: the Recoll WebUI application is mostly single-threaded, so it is of
little use (and may actually be counter-productive in some cases) to
specify multiple threads on the WSGIDaemonProcess line. Specify multiple
processes instead to put multiple CPUs to work on simultaneous requests.
Then run the following to restart apache:
sudo apachectl restart
The Recoll WebUI should now be accessible. on 'http://my.server.com/recoll/'
NOTE: Take care that you need a '/' at the end of the URL used to access
the search (use: 'http://my.server.com/recoll/', not
'http://my.server.com/recoll'), else files other than the script itself are
not found (the page looks weird and the search does not work).
CAUTION: THERE IS NO ACCESS CONTROL. ANYONE WITH ACCESS TO THE NETWORK
WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY DOCUMENT.
== Variant for BSD/ports
=== Packages
As root:
pkg install recoll
Do what you need to do to configure the indexing and check that the normal
search works.
Take care to be logged in as the user you want to run the web search as
while you do this.
pkg install apache24
Add apache24_enable="YES" in /etc/rc.conf
pkg install ap24-mod_wsgi4
pkg install git
=== Clone the webui repository
cd /usr/local/www/apache24/
git clone https://github.com/koniu/recoll-webui.git recoll-webui-master
Important: most input handler helper applications (e.g. 'pdftotext') are
installed in '/usr/local/bin' which is not in the PATH as seen by Apache
(at least on DragonFly). The simplest way to fix this is to modify the
launcher module for the webui app so that it fixes the PATH.
Edit 'recoll-webui-master/webui-wsgi.py' and add the following line after
the 'import os' line:
os.environ['PATH'] = os.environ['PATH'] + ':' + '/usr/local/bin'
=== Configure apache
Edit /usr/local/etc/apache24/modules.d/270_mod_wsgi.conf
Uncomment the LoadModule line, and add the directives to alias /recoll/ to
the webui script.
Change the user (dockes in the example) taking care that he is the one who
owns the index (.recoll is in his home directory).
Contents of the file:
## $FreeBSD$
## vim: set filetype=apache:
##
## module file for mod_wsgi
##
## PROVIDE: mod_wsgi
## REQUIRE:
LoadModule wsgi_module libexec/apache24/mod_wsgi.so
WSGIDaemonProcess recoll user=dockes group=dockes \
threads=1 processes=5 display-name=%{GROUP} \
python-path=/usr/local/www/apache24/recoll-webui-master/
WSGIScriptAlias /recoll /usr/local/www/apache24/recoll-webui-master/webui-wsgi.py
<Directory /usr/local/www/apache24/recoll-webui-master>
WSGIProcessGroup recoll
Require all granted
</Directory>
=== Restart apache
As root:
apachectl restart