Hi,
On an old laptop with 1GB of RAM, using real-time indexing memory usage seems okay except for 1 particular Excel spreadsheet which is ~30MB in size. Memory usage rises to ~700MB and the system starts swapping (swap space fills up too). I notice the python process running xls-dump.py is the culprit. recollindex uses only around 12MB.
Wondering if that python script can be improved to limit its memory usage, or whether we can have a configuration option to skip processing files over a certain size.
Or any other tips to deal with this situation?
Thank you and thanks for your work on this useful tool.
Discussion
-
medoc
2017-10-04The xls data extractor has trouble on a few xls files. It's still useful overall, but there are definitely a few cases where it will go into a loop or grow very large.
There is a configuration parameter which may help: filtermaxmbytes. This limits maximum memory usage by the commands forked by recoll: https://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/RCL.INSTALL.CONFIG.RECOLLCONF.PERFS.html
I think that this parameter appeared in recoll 1.21
-
Anonymous
2017-10-04Thanks, that parameter looks like it will help.
-
medoc
2017-10-14- status: open --> closed
- milestone: -->