For example, after saving the web page https://en.wikipedia.org/wiki/Logistic_regression as html, the indexing of the svg
files in the file folder takes several minutes per file, although the content of each single file is small.
Discussion
-
Anonymous
2018-03-20This happens under OpenSUSE 42.3 with Recoll 1.23.1.
-
medoc
2018-03-22I tried to reproduce this on Debian and I can't. Processing the full directory takes less than 1S (on a moderately fast core i5 750).
You can try the data extraction by executing something like:
/usr/share/recoll/filters/rclsvg.py - /some/svg/file.svg
Then maybe add traces to the python code to see where it's slow ?
-
medoc
2018-04-07- status: open --> closed
- milestone: -->
-
medoc
2018-04-07no feedback, can't reproduce
-
Enno
2018-06-02I am sorry for the late reply.
Yes, it stalls when tried on a 4 kb svg file and finally gives
== Entry 1 dlen 154 ipath (mimetype []):
-
medoc
2018-06-03- status: closed --> open
-
medoc
2018-06-03Would it be possible for you to share the svg file (possibly privately: jf@dockes.org) ?
-
Enno
2018-06-03No problem, these are just svg files obtained by saving an html file in firefox on wikipedia. One is attached.
Attachments
-
medoc
2018-06-04The most likely cause I can think of is that the XML processor is downloading external documents (DTDs) from the network. Please try to replace /usr/share/recoll/filters/rclxslt.py with the attached copy where network downloading has been disabled.
On Debian, the local copies of the DTD comes with the sgml-data package and is located, e.g. for svg, in /usr/share/xml/svg/svg11.dtd
Attachments
-
Enno
2018-06-04Thank you! Though it did not help; this wasn't it. The behavior is the same.