open
nobody
None
2018-01-09
2017-12-08
hedning
No

Hi, I just found this great project. Been looking for full text web history search a while.

At the moment the webextension relies on a download hack, which can be quite annoying, and only enables communication from the browser to the backend. The webextension API have support for native messaging, enabling communication in both directions, making the download hack unnecessary. In addition it would open up the possibility searching the index from the browser greatly increasing use-ability.

If you're busy and it seems like too much work, but find the idea interesting I might take a look at it.

Discussion

  • medoc
    medoc
    2017-12-08

    Hi,

    If you feel inclined to take a look, that would be really great.

    I am fully incompetent with Javascript and browser environments, I just pared down the Save Page WE extension until it did more or less what was needed.

    Using downloads is indeed very inconvenient, also because this necessitates another step to move the files to the actual place where recollindex expects them. A better extension should create them there directly.

    The mover script is here if you want to see what is finally expected: https://opensourceprojects.eu/p/recoll1/code/ci/216c69ff2d96edbb7cb85f8997cd60005a00dadc/tree/src/filters/recoll-we-move-files.py

    The strange format is itself inherited from the old Beagle extension. Nothing gets wasted here :)

    So yeah, please go for it !

    jf

     
    • hedning
      hedning
      2017-12-08

      Thanks for the link. That helped me get going :)

      I'll try to sketch the necessary things to make this work here (much for my own benefit).

      webextension side

      The webextension will just need to pass along messages to the native application side.

      The code for sending messages is pretty simple:
      javascript port = chrome.runtime.connectNative("<name>"); port.postMessage(message);

      This will launch the application and keep the connection open.

      Messages is encoded as json. Generating something like this in content.js passing it through background.js and then to the native application should work:

      json { type: 'index', url: window.location.toString(), title: document.title, data: document.documentElement.outerHTML }

      The webextension will also need an explicit addon id to be used by the native application manifest.

      recoll application side

      Instead of the script moving files from ~/Downloads we need a script which can receive messages. As luck would have it I already have a python script that can handle this (from an abandoned project using solr, solr is aweful...). The messaging functionality could be added to the recoll-we-move-files.py script, creating files in ~/.recollweb/ToIndex using the message content.

      A native application manifest needs to be installed to let Firefox know about the native application.

      The manifest will look something like this, where <name> is the name of the python script:

      json { "name": "<name>", "description": "Example host for native messaging", "path": "/path/to/installed/python/script/<name>", "type": "stdio", "allowed_extensions": [ "<recollwe-addID>" ] }

      On linux (been many years since I've used Windows so I'm not well versed on that) the manifest needs to be installed here /usr/lib/mozilla/native-messaging-hosts/<name>.json.

      On windows a registry key needs to be set: HKEY_CURRENT_USER\SOFTWARE\Mozilla\NativeMessagingHosts\<name>.

      The python script will need to parse the messages, look at the type (so it's possible to support search from the browser) and just save the data to an appropriately named file (along with the metadata), with no need for the intermediary file naming.

      Search from the browser

      With native messaging in place it should be possible to support searching from the browser, either from the address bar or from an extension popup. Address bar might be somewhat more convenient, but popups makes it easier to present snippets properly, though it's of course possible to support both setups.

      Something like recoll -t -A -q <query>', where we get the query from a message, and then parsing the result into json sending it back ought to work OK, but would include matches that's not urls too, I'm guessing there's a way to filter based on source though. Filtering out anything that's not text/html might work okay too though.

      Actually making it

      In principle nothing needs to be included on the recoll side as all the native side needs to do is saving files in the correct directory. This makes prototyping easier. Ideally though it's best included in recoll so installation is just webextension + recoll, and not webextension + recoll + native app.

      Hopefully I'll maintain the interest and put together a working prototype :)

       
  • medoc
    medoc
    2017-12-08

    The thing which is not clear to me is what launches the receiving script. Hopefully firefox does it ?

    The web indexing has never worked under windows. It would be a worthy goal to make it work, but not an issue.

    There is a way to only retrieve the web results: add rclbes:BGL to the query string (this sort of means "recoll backend store : Beagle"

    If this works, of course, I will add the script to the recoll distribution.

     
    • hedning
      hedning
      2017-12-08

      Ah, yes, firefox will automatically launch the receiving script when the extension sends messages to it (by looking at the absolute path in the native application manifest).

      rclbes:BGL is handy :)

      I also realized that with a new web extension, recoll could just as easily support both extensions. That might be a cleaner approach, which wouldn't break anyone's existing workflows.

       
  • hedning
    hedning
    2017-12-09

    Okay, making a working prototype was easy: recoll-web. It should do the same thing as recoll-we, saving all pages (through messages of course), also saving when clicking on the browser action.

     
    Last edit: hedning 2017-12-09
  • hedning
    hedning
    2017-12-14

    I've gotten basic search from within firefox up and running, using the python2 recoll module.

    It uses the omnibox api. Typing r search query in the address bar will provide up to 6 entries (a limitation in the API), where the first one will take you to a bare bones local search page:

    I'm thinking of adding !bang support like duckduckgo, so it's easy to start a web search from the local search page. So eg. some query !d will redirect to a duckduckgo search for the entered query.

    Attached a screenshot of the very basic search page (didn't want to inline it for some reason).

     
    Last edit: hedning 2017-12-14
    Attachments
  • medoc
    medoc
    2017-12-14

    Hi,

    I am sorry, I have been very busy, and will still be for the next few days. I do intend to try your module as soon as I can !

     
  • medoc
    medoc
    2018-01-09

    Hi,

    So I finally got around to trying out the extension, and it's very nice !

    However, it's not really end-user-ready :(

    Do you intend to finalize it for easy installation and submitting to the mozilla add-ons repo ?

    There are quite a few cosmetic things to fix, like the content of the native app manifest, use of jq by the install script etc. I did not look at the code itself, just the wrapping :)

    I'd be quite willing to retire the current extension in favour of yours, but it needs a bit of polishing...

     
    • hedning
      hedning
      2018-01-09

      Thanks for taking a look :) As you noted it's far from ready yet.

      For installation I think the ideal would be to bundle the native app with Recoll, so installation of the native app would be done by installing Recoll itself. This should work on most main stream distros I think. Unfortunately a stand alone installer which installs to $HOME seems to be needed for any distro that doesn't use FHS, eg. NixOS which I'm running.

      I do intend to finish it and publish it to AMO, though haven't worked on it much lately. Will let you know when it's a more polished state.

       

Cancel   Add attachment