I used recollindex to index a large number of .eml files. One email had a text attachment. I ran recoll and enter a search term in the email. When the search type is set to "message" it returns the email as a result. When the search type is set to "text" it returns the attachment as a result. When the search type is set to "all" it returns the attachment as a result. I expected to see both - or maybe the email. The search term is not present in the text attachment.
screenshot of recoll gui attached
Discussion
-
medoc
2018-06-26This is weird. Is there any possibility that you could share the eml file (jf@dockes.org) ?
I'll be gone for a week, but I'd like to take a look at what happens.
Also the "show query" links output might be useful.
-
Rich T
2018-06-27I'll make a zipfile that has just the one email and a small one-message recoll DB and send it next week.
-
medoc
2018-07-07- status: open --> closed
- milestone: -->
-
medoc
2018-07-07Thanks for your help with this !
This should be fixed by commit https://opensourceprojects.eu/p/recoll1/code/ci/7b8ba96b25bcdb19ce7a29cc483023b7eae64b9f/tree/src/internfile/mh_mail.cpp?diff=7048d2a014479fd2dc82f0fab21093543d66a9b2
-
medoc
2018-07-07Unfortunately, you do need to rebuild the index. The exact problem was that text/plain attachments had their parent's md5 checksum, instead of their own, so they were eliminated during search when the hide duplicates options was set.
Thanks for your help in solving this ! The best option if you want this fixed if to build from the RECOLL_1_24_MAINT branch. Otoh, once you know that the problem exists, it's probably possible to live with it (by toggling the search filters when in doubt).
-
Rich T
2018-07-07actually that's what I expected - but it seemed to be working (with your fix). Then I realized I still had "hide duplicates" set.