None
closed
nobody
None
2018-01-14
2017-10-20
Rich R
No

WHen I run this search, "Result count (est.): 0
Query details: Query(((accommodation OR accommodation#10) AND (XP PHRASE 6 XPhome PHRASE 6 XPrich PHRASE 6 XPDropbox PHRASE 6 XPPublications PHRASE 6 XPdigital humanities))) "

I do not get any results, although I know that the terms appear in many docs in the sub directory in question. This seems to be the case with any time I run a search constrained to a path. It might be a path that is N layers deep, because as I gradually remove the folders, at one point all of the results for the larger scope will appear.

Running manjaro linux, up to date, cinnamon 3.4.6. Linux kernel 4.9, deleted all config files, reinstalled recoll, reindexed.

Discussion

  • medoc
    medoc
    2017-10-20

    I tried the same path and query, and this appears to work for me. Could you please tell me what recoll version you are using (from the Help menu)

    Result count (est.): 1
    Query details: ((accommodation:(wqf=11) AND (XP PHRASE 6 XPhome PHRASE 6 XPdockes PHRASE 6 XPDropbox PHRASE 6 XPPublications PHRASE 6 XPdigital humanities)))

     
  • Rich R
    Rich R
    2017-10-20

    Recoll 1.23.2 + Xapian 1.4.4
    Actually, it still seems to be indexing, so let me try again whenever it finishes the index and report back. This behavior was happening before I deleted the index files and started fresh though.

    here is idxstatus.txt. Filesdone keeps increasing so I assume indexing is not complete yet. Is there a way to tell when the index is complete?

    phase = 1
    docsdone = 144962
    filesdone = 205529
    fileerrors = 662
    dbtotdocs = 177361
    totfiles = 205529
    fn = (path to file removed)

     
    Last edit: Rich R 2017-10-23
  • Rich R
    Rich R
    2017-10-23

    Recoll appears to be indexing config files despite skipped path entry "/home/rich/.*"

    Search Query(((accommodation OR accommodation#10) AND (XP PHRASE 6 XPhome PHRASE 6 XPrich PHRASE 6 XPDropbox PHRASE 6 XPPublications PHRASE 6 XPdigital humanities)))

    returns empty, but I know the term appears. This is happening regularly on other searches as well.

     
  • medoc
    medoc
    2017-10-26

    Please have a look at this page:

    http://www.lesbonscomptes.com/recoll/faqsandhowtos/WhyIsMyFileNotIndexed.html

    and use recollindex -e -i to check that the files you can't find are actually indexed

    About skippedPaths : are you sure that you have no typo in the variable name ? This works as far as I know. Possibly attach your recoll.conf file so that I can check.

     
    • Rich R
      Rich R
      2017-10-26

      Thank you mssr medoc! Unfortunately, I had a hardware problem and the box is in the shop right now. As soon as I get it back I will run through the processes you pointed me to, so plz don't close the ticket yet.

       
  • firef
    firef
    2017-11-27

    • Status: open --> accepted
     
  • medoc
    medoc
    2017-11-29

    • status: accepted --> open
    • milestone: -->
     
  • Rich R
    Rich R
    2017-11-30

    it was taking forever to index. It seemed like it was trying to index .svg files from compressed ebooks and getting stuck there. I added svg files to the ignored endings to see if that moves the process along any better.

     
  • Rich R
    Rich R
    2017-11-30

    It is taking a long time indexing pdfs now, spending over two minutes on one.

     
  • Rich R
    Rich R
    2017-12-05

    Hi, back again. Idid a reinstall of manjaro cinnamon edition, updated everything, and ran the indexer again with a newly written conf file that fixed some earlier problems. It took a while to reset the box, and it took several days more for recoll to complete the indexing. As far as I can tell, everything is indexed including the ADA path. If I do the search for "accommodation" with no path specified, it finds all instances, including the ones in the ADA path. If I search on a specific file name, that I know contains the search term, the search finds nothing. If I delete the filename clause and specify the full path to the ADA directory, the search for "accommodation" yields no results. Both filename search and path search fail with path in quotes too, on the off chance that file and path names with spaces need to be quoted. If I check the "invert" and run with ADA path, no quotes, and it returns all instances, both within the subtree and alsewhere.

    When I run the recollindex -i as per debug, the logs show the files as already indexed with no problem. when I run recollindex on the while index, it shows everything up to date and appears to exit without error. Is there anything in particular to search for in the logs? I can't share the logs for obvious privacy reasons.

    Thanks for your help!

     
    Last edit: Rich R 2017-12-05
  • medoc
    medoc
    2017-12-05

    Could you please attach a screenshot of the query screen (from what you write, I guess that you are using the advanced search dialog)?

     
  • medoc
    medoc
    2018-01-02

    Thanks for the log. There does not appear to be something special in it though. This really looks like a query-time issue.

    The slowness of queries is probably due to the issue described in the following:
    https://lists.xapian.org/pipermail/xapian-discuss/2017-December/009564.html

    There is a workaround in recoll 1.23.6. You can check that this is the issue that you are having by unchecking "Preferences->GUI Configuration->Search Parameters->Dynamically build abstracts"

    About the main issue of missing results: I installed Manjaro to be in the same configuration, and I just can't reproduce it: the directory filtering works as expected when I check or uncheck the "Invert" checkbox.

    Do you see the same issue if you use the 'simple search' entry (the default one) in 'Query language' with a search like:

     accommodation dir:"/home/rich/Dropbox/digital humanities"
    

    Or

     accommodation -dir:"/home/rich/Dropbox/digital humanities"
    
     
  • medoc
    medoc
    2018-01-02

    Another thing which would be very interesting would be the log from the query session

     
  • Rich R
    Rich R
    2018-01-03

    The query speed issue went away when I unchecked "Dynamically build abstracts" thanks for that. Manjaro repo recoll is on 1.23.3, so 1.23.6 should drop soon. I am attaching two new screenshots (for the thunder" search) done with simple search. same results. The search +dir returns nothing and the -minus dir returns stuff in the inverted subtree. I am attaching the query log, made by setting the gui to verbosity 6 and renaming the log file to to recoll-qry.log and restarting the gui, running the searches, and closing the gui.LMK if I should generate some other way. doint the method on the debug page (recollindex > /tmp/myindexlog 2>&1) produces the huge log files.

     
  • Rich R
    Rich R
    2018-01-03

    thank you for taking the time to deal with this. I hope it is not some boneheaded thing on my part!

     
  • medoc
    medoc
    2018-01-03

    It's either a bug or a usability issue, I would really like to get to the bottom of it.

    There was nothing in the query log.

    Maybe try like the following:

    In recoll.conf:

    logfilename = stderr
    loglevel = 6

    Then start the GUI from the command line as follows:

    recoll > /tmp/sometrace 2>&1

    Run the 2 queries, and attach the log.

    Also, if possible, attach the recoll.conf file.

     
  • Rich R
    Rich R
    2018-01-05

    deleteme

     
    Last edit: Rich R 2018-01-06
  • Rich R
    Rich R
    2018-01-05

    deleteme

     
    Last edit: Rich R 2018-01-06
  • Rich R
    Rich R
    2018-01-05

    here is the log file:

    qt5ct: using qt5ct plugin
    libpng warning: iCCP: known incorrect sRGB profile
    libpng warning: iCCP: known incorrect sRGB profile
    libpng warning: iCCP: known incorrect sRGB profile
    libpng warning: iCCP: known incorrect sRGB profile
    libpng warning: iCCP: known incorrect sRGB profile
    qt5ct: D-Bus system tray: no
    qt5ct: D-Bus global menu: no
    

    here is the conf file

    # The system-wide configuration files for recoll are located in:
    #   /usr/share/recoll/examples
    # The default configuration files are commented, you should take a look
    # at them for an explanation of what can be set (you could also take a look
    # at the manual instead).
    # Values set in this file will override the system-wide values for the file
    # with the same name in the central directory. The syntax for setting
    # values is identical.
    
    topdirs = /home/rich/.local/share/tomboy /run/media/rich/audio ~
    skippedNames = " _*.*" .* .beagle .bzr .cache .dropbox.cache .git \
    .hg .recoll* .svn .thumbnails .xsession-errors *~ *nltk_data* \
    #* bin Cache cache* caughtspam CVS index.*.html loop.ps \
    recoll.conf recollrc tmp Trash* xapiandb Fiddlybak*.html
    compressedfilemaxkbs = 500000
    textfilepagekbs = 10000
    textfilemaxmbs = 50
    noContentSuffixes = ,v .a .bak .com .dat .db .dll .exe .image .image.bz2 \
    .image.gz .image.xz .img .img.bz2 .img.gz .img.xz .lib \
    .jp2 .js .log .log.gz .map .md5 .mpp .mpt .msf .o .pid \
    .rdf .svg .sys .vsd # ~
    
    # added as per https://opensourceprojects.eu/p/recoll1/tickets/18/?page=1
    
    loglevel = 6
    logfilename = stderror
    

    also attaching two screenshots of searchresults page with query

     
    Last edit: Rich R 2018-01-05
  • Rich R
    Rich R
    2018-01-05

    sent via email

     
    • Rich R
      Rich R
      2018-01-05

      for some reason getting 405 error from browser when posting, but logged out and back in and reloaded and the posts that I got errors for are now posted. Sorry for any multiple posts, etc. ok to ignore email since it is copy of the post and just work here unless you wish to move to email!

       
  • medoc
    medoc
    2018-01-06

    It's logfilename = stderr, not stderror, which went to a file named 'stderror'

     
    • Rich R
      Rich R
      2018-01-06

      oops. sorry. fixed recoll.conf and ran again, "-dir:" first, then "dir:". log attached.

       
      Last edit: Rich R 2018-01-06
      Attachments
  • medoc
    medoc
    2018-01-07

    Thanks. I seem to see what is happening, the query processor is downcasing Dropbox for some reason, so the query can't match the index, where path terms are stored literally. Now, I just have to track why it happens, it may take some time, but at least, I have a string to pull on...

     
  • medoc
    medoc
    2018-01-08

    I have released recoll 1.23.7, which should fix this problem. Unfortunately, I have no ready means to build an arch package. I hope it won't take too long to trickle down. Maybe you can ping the maintainer...

     
  • Rich R
    Rich R
    2018-01-13

    fixed in 1.2.7. Thank you!

     
    • medoc
      medoc
      2018-01-14

      Thanks a lot for your patience and valuable input in helping me solve this !

      Cheers,

      jf

       
      Last edit: medoc 2018-01-14
  • medoc
    medoc
    2018-01-14

    • status: open --> closed
     

Cancel   Add attachment