None
closed
nobody
svg (1)
3 days ago
2018-03-20
Anonymous
No

For example, after saving the web page https://en.wikipedia.org/wiki/Logistic_regression as html, the indexing of the svg files in the file folder takes several minutes per file, although the content of each single file is small.

Discussion

1 2 > >> (Page 1 of 2)
  • medoc
    medoc
    2018-03-22

    I tried to reproduce this on Debian and I can't. Processing the full directory takes less than 1S (on a moderately fast core i5 750).

    You can try the data extraction by executing something like:

    /usr/share/recoll/filters/rclsvg.py - /some/svg/file.svg
    

    Then maybe add traces to the python code to see where it's slow ?

     
  • medoc
    medoc
    2018-04-07

    • status: open --> closed
    • milestone: -->
     
  • medoc
    medoc
    2018-04-07

    no feedback, can't reproduce

     
  • Enno
    Enno
    2018-06-02

    I am sorry for the late reply.

    Yes, it stalls when tried on a 4 kb svg file and finally gives

    == Entry 1 dlen 154 ipath (mimetype []):

     
  • medoc
    medoc
    2018-06-03

    • status: closed --> open
     
  • medoc
    medoc
    2018-06-03

    Would it be possible for you to share the svg file (possibly privately: jf@dockes.org) ?

     
  • medoc
    medoc
    2018-06-04

    The most likely cause I can think of is that the XML processor is downloading external documents (DTDs) from the network. Please try to replace /usr/share/recoll/filters/rclxslt.py with the attached copy where network downloading has been disabled.

    On Debian, the local copies of the DTD comes with the sgml-data package and is located, e.g. for svg, in /usr/share/xml/svg/svg11.dtd

     
    Attachments
  • Enno
    Enno
    2018-06-04

    Thank you! Though it did not help; this wasn't it. The behavior is the same.

     
1 2 > >> (Page 1 of 2)

Cancel   Add attachment