is there a way for recoll to index certain "homogeneous" compound document types such as .chm and .epub files as if they were "atomic", like .pdf files ?
if not, then is there a way of coalescing multiple search matches from the same "homogeneous" compound document into a single match in the search results ?
Discussion
-
medoc
2018-01-05rclchm had an option to concatenate the chapters, but it was a local variable and you had to modify the script.
I just changed rclchm and rclepub to look at the epubcatenate and chmcatenate variables in recoll.conf and concatenate the chapters if they are set (= 1). You can get the scripts from git and replace the ones in /usr/share/recoll/filters
The new versions will be in the next recoll release.
Please let me know how this works for you.
jf
Last edit: medoc 2018-01-06
-
Anonymous
2018-01-07maybe i'm doing something wrong, but i can't get the new filters to work properly. if i don't set the variables, i get the old behavior, as expected... but if i set them, only the filenames seem to get indexed, not the file contents.
-
medoc
2018-01-07I realised that this feature needed the 'lynx' command installed. It's used to extract the text from the concatenated document. I'm not too sure why actually, but the concatenating version of rclchm used lynx, and I kept it.
So this is the most probably cause. Please install lynx and retry.
Otherwise, you can execute the filter 'by hand', and it will probably print error messages, please paste them here:
/usr/share/recoll/filters/rclepub /path/to/some/file.epub
-
medoc
2018-01-14- milestone: -->
-
medoc
2018-01-14Now works as far as I can see.
-
Anonymous
2018-01-15yes, it works now with lynx installed.
thank you.on a related note: is recoll capable of indexing .mobi files, and if so what are the requirements ?
Last edit: Anonymous 2018-01-15
-
medoc
2018-01-22Sorry for the delay in answering. There is currently no support for indexing mobi files contents. I could not find a Python module for reliably accessing these. I did not make a lot of efforts though. If you find any tool which can dump the text, it would be very easy to add a mobi handler.
-
medoc
2018-01-22- status: open --> closed