= Recoll WebUI Apache and nginx installation from scratch
NOTE: thanks to Michael L. Wilson for the `nginx` part.
The https://github.com/koniu/recoll-webui[Recoll WebUI] offers an
alternative, WEB-based, interface for querying a Recoll index.
It can be quite useful to extend the use of a shared index to multiple
workstations, without the need for a local Recoll installation and shared
data storage.
The Recoll WebUI is based on the
http://bottlepy.org/docs/dev/index.html[Bottle Python framework], which has
a built-in WEB server, and the simplest deployment approach is to run it
standalone. However the built-in server is restricted to handling one
request at a time, which is problematic in multi-user situations,
especially because some requests, like extracting a result list into a CSV
file, can take a significant amount of time.
The Bottle framework can work with several multi-threading Python HTTP
server libraries, but, given the limitations of the Recoll Python module
and the Python interpreter itself, this will not yield optimal performance,
and, especially can't efficiently leverage the now ubiquitous
multiprocessors.
In multi-user situations, you can get better performance and ease of use
from the Recoll WebUI by running it under Apache or Nginx rather than as a
standalone process. With this approach, a few requests per second can
easily be handled even in the presence of long-running ones.
Neither Recoll nor the WebUI are optimized for high multi-user load, and it
would be very unwise to use them as the search interface to a busy WEB
site.
The instructions about using the WebUI under Apache as given in the
repository README are a bit terse, and are missing a few details,
especially ones which impact performance.
Here follows the synopsis of three WebUI installations on initially
Apache-less Ubuntu (14.04) and DragonFly BSD systems, and for
Nginx/BSD. The first should extend easily to other Debian-based systems,
the second at least to FreeBSD. rpm-based systems are left as an exercise
to the reader, at least for now...
CAUTION: THE CONFIGURATIONS DESCRIBED HAVE NO ACCESS CONTROL. ANYONE WITH
ACCESS TO THE NETWORK WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY
DOCUMENT.
link:#nginx[Jump to the nginx section].
[[apache]]
== Apache
=== On a Debian/Ubuntu system
==== Install recoll
sudo apt-get install recoll python-recoll
Configure the indexing and check that the normal search works (I spent
quite a lot of time trying to understand why the WebUI did not work, when
in fact it was the normal recoll configuration which was broken and the
regular search did not work either).
Take care to be logged in as the user you want to run the web search as
while you do this.
==== Install the WebUI
Clone the github repository, or extract the master tar installation, and
move it to '/var/www/recoll-webui-master/'. Take care that it is read/execute
accessible by your user.
==== Install Apache and mod-wsgi
sudo apt-get install apache2 libapache2-mod-wsgi
I then got the following message:
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
To clear it, I added a ServerName directive to the Apache config, maybe you
won't need it. Edit '/etc/apache2/sites-available/000-default.conf' and add
the following at the top (globally). Things work without this fix anyway,
this is just to suppress the error message. You probably need to adjust the
address or use a real host name:
ServerName 192.168.4.6
Edit '/etc/apache2/mods-enabled/wsgi.conf', add the following at the end of
the "IfModule" section.
Change the user ('dockes' in the example) taking care that he is the one who
owns the index ('.recoll' is in his home directory).
WSGIDaemonProcess recoll user=dockes group=dockes \
threads=1 processes=5 display-name=%{GROUP} \
python-path=/var/www/recoll-webui-master
WSGIScriptAlias /recoll /var/www/recoll-webui-master/webui-wsgi.py
<Directory /var/www/recoll-webui-master>
WSGIProcessGroup recoll
Order allow,deny
allow from all
</Directory>
NOTE: the Recoll WebUI application is mostly single-threaded, so it is of
little use (and may actually be counter-productive in some cases) to
specify multiple threads on the WSGIDaemonProcess line. Specify multiple
processes instead to put multiple CPUs to work on simultaneous requests.
Then run the following to restart Apache:
sudo apachectl restart
The Recoll WebUI should now be accessible. on 'http://my.server.com/recoll/'
NOTE: Take care that you need a '/' at the end of the URL used to access
the search (use: 'http://my.server.com/recoll/', not
'http://my.server.com/recoll'), else files other than the script itself are
not found (the page looks weird and the search does not work).
CAUTION: THERE IS NO ACCESS CONTROL. ANYONE WITH ACCESS TO THE NETWORK
WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY DOCUMENT.
=== Apache Variant for BSD/ports
==== Packages
As root:
pkg install recoll
Do what you need to do to configure the indexing and check that the normal
search works.
Take care to be logged in as the user you want to run the web search as
while you do this.
pkg install apache24
Add apache24_enable="YES" in /etc/rc.conf
pkg install ap24-mod_wsgi4
pkg install git
==== Clone the webui repository
cd /usr/local/www/apache24/
git clone https://github.com/koniu/recoll-webui.git recoll-webui-master
Important: most input handler helper applications (e.g. 'pdftotext') are
installed in '/usr/local/bin' which is not in the PATH as seen by Apache
(at least on DragonFly). The simplest way to fix this is to modify the
launcher module for the webui app so that it fixes the PATH.
Edit 'recoll-webui-master/webui-wsgi.py' and add the following line after
the 'import os' line:
os.environ['PATH'] = os.environ['PATH'] + ':' + '/usr/local/bin'
==== Configure Apache
Edit /usr/local/etc/apache24/modules.d/270_mod_wsgi.conf
Uncomment the LoadModule line, and add the directives to alias /recoll/ to
the webui script.
Change the user (dockes in the example) taking care that he is the one who
owns the index (.recoll is in his home directory).
Contents of the file:
## $FreeBSD$
## vim: set filetype=apache:
##
## module file for mod_wsgi
##
## PROVIDE: mod_wsgi
## REQUIRE:
LoadModule wsgi_module libexec/apache24/mod_wsgi.so
WSGIDaemonProcess recoll user=dockes group=dockes \
threads=1 processes=5 display-name=%{GROUP} \
python-path=/usr/local/www/apache24/recoll-webui-master/
WSGIScriptAlias /recoll /usr/local/www/apache24/recoll-webui-master/webui-wsgi.py
<Directory /usr/local/www/apache24/recoll-webui-master>
WSGIProcessGroup recoll
Require all granted
</Directory>
==== Restart Apache
As root:
apachectl restart
[[nginx]]
== Nginx
=== Nginx for BSD/ports
As root:
pkg install recoll
Do what you need to do to configure the indexing and check that the normal
search works. Take care to be logged in as the user you want to run the web
search as while you do this.
Install required packages:
pkg install nginx uwsgi git
=== Nginx: clone the webui repository
rm /usr/local/www/nginx
mkdir /usr/local/www/nginx
cd /usr/local/www/nginx
git clone https://github.com/koniu/recoll-webui.git recoll-webui-master
Important: most input handler helper applications (e.g. 'pdftotext') are
installed in '/usr/local/bin' which is not in the PATH as seen by Nginx
(at least on DragonFly). The simplest way to fix this is to modify the
launcher module for the webui app so that it fixes the PATH.
Edit 'recoll-webui-master/webui-wsgi.py' and add the following line after
the 'import os' line:
os.environ['PATH'] = os.environ['PATH'] + ':' + '/usr/local/bin'
Also change the following to find the correct path:
#os.chdir(os.path.dirname(__file__))
os.chdir('/usr/local/www/nginx/recoll-webui-master')
=== Nginx: configure uWSGI
Assuming the user running the search is "dockes" (change it to your user),
sysrc uwsgi_uid=$(id -u dockes)
sysrc uwsgi_gid=$(id -g dockes)
sysrc uwsgi_flags="-M -L --wsgi-file /usr/local/www/nginx/recoll-webui-master/webui-wsgi.py"
(ALTERNATIVELY)
Add the following to rc.conf
uwsgi_uid="dockes"
uwsgi_gid="dockes"
uwsgi_flags="-M -L --wsgi-file /usr/local/www/nginx/recoll-webui-master/webui-wsgi.py"
=== Configure nginx
Edit /usr/local/etc/nginx/nginx.conf and set up a proxy to uwsgi service:
location / {
include uwsgi_params;
uwsgi_pass unix:///tmp/uwsgi.sock;
}
=== Enable and start both services
As root:
sysrc uwsgi_enable=YES #Or uwsgi_enable="YES" (in rc.conf)
sysrc nginx_enable=YES #Or nginx_enable="YES" (in rc.conf)
service uwsgi start
service nginx start