recoll / Code / Diff of /website/idxthreads/forkingRecoll.txt

Diff of /website/idxthreads/forkingRecoll.txt [12682b] .. [f99c3a]

Switch to unified view


...

== Abstract

== Introduction

The Recoll indexer, *recollindex*, is a big process which executes many
others, mostly for extracting text from documents. Some of the executed
processes are quite short-lived, and the time used by the process execution
machinery can actually dominate the time used to translate data. This
document explores possible approaches to improving performance without
adding excessive complexity or damaging reliability.

Studying fork/exec performance is not exactly a new venture, and there are
many texts which address the subject. While researching, though, I found
out that not so many were accurate and that a lot of questions were left as
an exercise to the reader.
...

+exec()+ then replaces part of the newly executing process with an address
space initialized from an executable file, inheriting some of the resources
under various conditions.

This was all fine with the small processes of the first Unix systems, but
as time progressed, processes became bigger and the copy-before-discard
operation was found to waste significant resources. It was optimized using
two methods (at very different points in time):

 - The first approach was to supplement +fork()+ with the +vfork()+ call, which
   is similar but does not duplicate the address space: the new process
   thread executes in the old address space. The old thread is blocked
   until the new one calls +exec()+ and frees up access to the memory
...
a single thread, and +fork()+ if it ran multiple ones.

After another careful look at the code, I could see few issues with
using +vfork()+ in the multithreaded indexer, so this was committed. 

The only change necessary was to get rid of an implementation of the
lacking Linux +closefrom()+ call (used to close all open descriptors above a
given value). The previous Recoll implementation listed the +/proc/self/fd+
directory to look for open descriptors but this was unsafe because of of
possible memory allocations in +opendir()+ etc.

...
No surprise here, given the implementation of +posix_spawn()+, it gets the
same times as the +fork()+/+vfork()+ options.

The tests were performed on an Intel Core i5 750 (4 cores, 4 threads).




It would be painful to play it safe and discard the 60% reduction in
execution time offered by using +vfork()+, so this was adopted for Recoll

1.21. To this day, no problems were discovered, but, still crossing
fingers...

The last line in the table is just for the fun: *recollindex* 1.18
(single-threaded) needed almost 6 times as long to process the same
files...

////
Objections to vfork: 
  sigaction locks
https://bugzilla.redhat.com/show_bug.cgi?id=193631

	a/website/idxthreads/forkingRecoll.txt		b/website/idxthreads/forkingRecoll.txt
	...		...
5		5
6	== Abstract	6	== Abstract
7		7
8	== Introduction	8	== Introduction
9		9
10	Recoll is a big process which executes many others, mostly for extracting	10	The Recoll indexer, recollindex, is a big process which executes many
11	text from documents. Some of the executed processes are quite short-lived,	11	others, mostly for extracting text from documents. Some of the executed
12	and the time used by the process execution machinery can actually dominate	12	processes are quite short-lived, and the time used by the process execution
13	the time used to translate data. This document explores possible approaches	13	machinery can actually dominate the time used to translate data. This
14	to improving performance without adding excessive complexity or damaging	14	document explores possible approaches to improving performance without
15	reliability.	15	adding excessive complexity or damaging reliability.
16		16
17	Studying fork/exec performance is not exactly a new venture, and there are	17	Studying fork/exec performance is not exactly a new venture, and there are
18	many texts which address the subject. While researching, though, I found	18	many texts which address the subject. While researching, though, I found
19	out that not so many were accurate and that a lot of questions were left as	19	out that not so many were accurate and that a lot of questions were left as
20	an exercise to the reader.	20	an exercise to the reader.
	...		...
30		30
31	+exec()+ then replaces part of the newly executing process with an address	31	+exec()+ then replaces part of the newly executing process with an address
32	space initialized from an executable file, inheriting some of the resources	32	space initialized from an executable file, inheriting some of the resources
33	under various conditions.	33	under various conditions.
34		34
35	As processes became bigger the copy-before-discard operation wasted	35	This was all fine with the small processes of the first Unix systems, but
36	significant resources, and was optimized using two methods (at very	36	as time progressed, processes became bigger and the copy-before-discard
37	different points in time):	37	operation was found to waste significant resources. It was optimized using
		38	two methods (at very different points in time):
38		39
39	- The first approach was to supplement +fork()+ with the +vfork()+ call, which	40	- The first approach was to supplement +fork()+ with the +vfork()+ call, which
40	is similar but does not duplicate the address space: the new process	41	is similar but does not duplicate the address space: the new process
41	thread executes in the old address space. The old thread is blocked	42	thread executes in the old address space. The old thread is blocked
42	until the new one calls +exec()+ and frees up access to the memory	43	until the new one calls +exec()+ and frees up access to the memory
	...		...
174	a single thread, and +fork()+ if it ran multiple ones.	175	a single thread, and +fork()+ if it ran multiple ones.
175		176
176	After another careful look at the code, I could see few issues with	177	After another careful look at the code, I could see few issues with
177	using +vfork()+ in the multithreaded indexer, so this was committed.	178	using +vfork()+ in the multithreaded indexer, so this was committed.
178		179
179	The only change necessary was to get rid on an implementation of the	180	The only change necessary was to get rid of an implementation of the
180	lacking Linux +closefrom()+ call (used to close all open descriptors above a	181	lacking Linux +closefrom()+ call (used to close all open descriptors above a
181	given value). The previous Recoll implementation listed the +/proc/self/fd+	182	given value). The previous Recoll implementation listed the +/proc/self/fd+
182	directory to look for open descriptors but this was unsafe because of of	183	directory to look for open descriptors but this was unsafe because of of
183	possible memory allocations in +opendir()+ etc.	184	possible memory allocations in +opendir()+ etc.
184		185
	...		...
198	No surprise here, given the implementation of +posix_spawn()+, it gets the	199	No surprise here, given the implementation of +posix_spawn()+, it gets the
199	same times as the +fork()+/+vfork()+ options.	200	same times as the +fork()+/+vfork()+ options.
200		201
201	The tests were performed on an Intel Core i5 750 (4 cores, 4 threads).	202	The tests were performed on an Intel Core i5 750 (4 cores, 4 threads).
202		203
203	The last line is just for the fun: recollindex 1.18 (single-threaded)
204	needed almost 6 times as long to process the same files...
205
206	It would be painful to play it safe and discard the 60% reduction in	204	It would be painful to play it safe and discard the 60% reduction in
207	execution time offered by using +vfork()+.	205	execution time offered by using +vfork()+, so this was adopted for Recoll
208
209	To this day, no problems were discovered, but, still crossing fingers...	206	1.21. To this day, no problems were discovered, but, still crossing
		207	fingers...
		208
		209	The last line in the table is just for the fun: recollindex 1.18
		210	(single-threaded) needed almost 6 times as long to process the same
		211	files...
210		212
211	////	213	////
212	Objections to vfork:	214	Objections to vfork:
213	sigaction locks	215	sigaction locks
214	https://bugzilla.redhat.com/show_bug.cgi?id=193631	216	https://bugzilla.redhat.com/show_bug.cgi?id=193631