== Generating a custom field and using it to sort results
We are going to show how to generate a custom field from a Recoll filter,
and use it for sorting results. The example chosen comes from an actual
user request: sorting results on pdf page counts.
The details here are obsolete, as the +pdf+ input handler is now a quite
different python program, but the general idea is still relevant.
The page count from a pdf file can be displayed by the pdfinfo command
(xpdf or poppler tools).
We first modify a copy of the rclpdf filter
('/usr/[local/]share/recoll/filters/rclpdf'), to compute the pdf page count,
and output the value as an html meta field. This is a not very interesting
bit of shell/awk magic. Another approach would be to just rewrite the
rclpdf filter in your favorite scripting language (ie: perl, python...), as
all it does is execute pdftotext and pdfinfo and output html, nothing
complicated. Here follows the rclpdf modification as a pseudo patch:
----
# compute the page count and format it so that it's alphabetically sortable
+set `pdfinfo "$infile" | egrep ^Pages:`
+pages=`printf "%04d" $2`
[skip...]
# Pass the page count value to awk
-awk 'BEGIN'\
+awk -v Pages="$pages" 'BEGIN'\
[skip...]
# Inside the awk program startup section: compute the "meta" field line
+ pagemeta = "<meta name=\"pdfpages\" content=\"" Pages "\">\n"
[skip...]
# Then print it as part of the header:
+ $0 = part1 charsetmeta pagemeta part2
[skip...]
----
You can execute your own version of rclpdf by modifying '~/.recoll/mimeconf':
----
[index]
application/pdf = exec /path/to/my/own/rclpdf
----
At this point, recollindex would receive and extract a +pdfpages+ field,
but it would not know what to do with it. We are going to tell it to store
the value inside the document data record so that it can be displayed in
the results, and sorted on. For this we modify the '~/.recoll/fields' file:
----
[stored]
pdfpages=
----
That's it ! After reindexing, you can now display +pdfpages+ inside the
result list (add a +%(pdfpages)+ value to the paragraph format), and display
+pdfpages+ inside the result table (right-click the table header), and sort
the results on page count (click the column header).
Note that +pdfpages+ has not been defined as searchable (this would not make
much sense). For this, you'd have to define a prefix and add it to the
[prefixes] fields file section:
----
[prefixes]
pdfpages = XYPDFP
----
Have a look at the comments inside the 'fields' file for more information.