-a/website/recoll_XMP/index.txt
+b/website/recoll_XMP/index.txt
 ...
 handler, which differs a lot from doing something equivalent with the
 current Python-based one (for which XMP capability is available from
 recoll 1.23.2, but the new handler can be used with previous Recoll
 versions).
-This page was adapted from the text by Jeffrey Dick, using input from
+I based this page on the text by Jeffrey Dick, using input from Johannes
-Johannes Menzel, (especially the result list paragraph format),
+Menzel for all examples about the new features. The discussion which led to
-adapting things for the new handler. The discussion which led to the
-updated handler is a
+the updated handler is a
 link:https://bitbucket.org/medoc/recoll/issues/300/extracting-xmp-metadata-and-tmsu-tags[Bitbucket
 Recoll issue].
 == Introduction
 ...
 field is a good place to tag the PDF with any words of your choosing
 to describe genre, topic, etc.
 image::jabref_metadata.png[Editing metadata with jabref]
-== Custom indexing (fields file)
+== Custom indexing short example (fields file)
-Let's create two fields named "year" and "journal". The prefixes
+The following example (extract from a complete configuration shown later)
-starting with "XY" are extension prefixes that are added to the terms
+creates two fields named "refjournal" and "refpages", which are both stored
-in the Xapian database (Recoll internally does not use prefixes
+(so they can be displayed in result list entries), and indexed (you can
-starting with XY). Additionally, the year and journal are stored so
+specifically search them).
-they can be displayed in the results list. Some other types of
-metadata, such as title, author and keywords, are already indexed by
+Some other types of metadata, such as title, author and keywords, are
-Recoll (the default rclpdf finds them using the *pdftotext*
+already indexed by Recoll (the default rclpdf finds them using the
-command) so there is no need to add those to the [prefixes] section.
+*pdftotext* command) so there is no need to add those to the [prefixes]
+section.
-Add this text to the fields file in your Recoll configuration
+This is taken from the `fields` file inside the configuration
-directory ('~/.recoll/fields').
+(e.g. '~/.recoll/fields').
 ----
 [prefixes]
-year = XYEAR
+refjournal=RFJOURNAL
-journal = XYJOUR
+refpages=RFPAGES
 [stored]
-bibtex:year =
+refjournal =
-bibtex:journal =
+refpages =
+[aliases]
+refjournal = bibtex:journal bibtex:journaltitle
+refpages = bibtex:pages
 ----
 == Telling the handler what fields to extract
-As of Recoll 1.23.2, the PDF handler has the capability to use
+As of Recoll 1.23.2, the PDF handler has the capability to use *pdfinfo*
-*pdfinfo* for extracting XMP metadata. The switch for executing *pdfinfo*
+for extracting XMP metadata. The switch for executing *pdfinfo* is the
-is the 'pdfextrameta' configuration parameter, and the value of the
+'pdfextrameta' configuration parameter, and the value of the parameter is a
-parameter is a list of XMP tags to extract, with optional conversion
+list of XMP tags to extract, with optional conversion to Recoll field names
-to Recoll field names (the XMP qualified tag name is kept by
+(the XMP qualified tag name is kept by default, the translation is
-default). Example:
+separated by a '|' character). Example (without translations):
 ----
-pdfextrameta =  bibtex:year bibtex:journal bibtex:booktitle|title
+pdfextrameta =  bibtex:year bibtex:journal bibtex:journaltitle
 ----
-Here, 'bibtex:year' and 'bibtex:journal' are used directly, and
+Note that it is quite equivalent to translate a field name inside
-'bibtex:booktitle' is translated to 'title' (the example is not
+'pdfextrameta' or to uses aliases inside the 'fields' file.
-supposed to make sense)
 == Editing the field values
 Shortly after the 1.23.2 release, the new rclpdf.py was modified to
 enable calling external Python code for editing the values of the XMP
 ...
             pass
         return txt
 ----
+The metadata-editing script can be modified to fill in the "journal" field for
+BibTex entries that aren't journal articles (e.g. bibtex:booktitle
+for "InCollection" entries), by defining a 'wrapup()' method which will
+be called with the whole metadata array (an array of '(nm,value)'
+pairs) for global editing/removing/addition.
 == Indexing
 Then index away!
 Note that you can also run the rclpdf.py script manually,
 ...
 output. If things are working correctly, the <head> consists of the
 HTML meta elements, and the <body> contains the text of the PDF.
 == Result paragraph format
-Here, the result is formatted to show the title, which is a link
+The result paragraph format defines what fields are displayed inside Recoll
-to open the document, in blue with underlining turned off. The next
+result list, and how they are formatted.
-two lines contain the authors, then the journal title in green
-italicized text followed by year (in parentheses). The keywords are
-listed in red after the abstract/text snippet.
 Edit this using the Recoll GUI: Preferences > GUI configuration >
     Result List > Edit result paragraph format string.
 ----
 <table class="respar" style="padding-bottom: 10px;" cellspacing="5" cellpadding="5">
 ...
 </table>
 ----
-The screenshot below also has the 'Highlight color for query terms'
+There are
-set to `black; font-weight:bold;` for bold, black text (instead
-of the blue default). There
-are linkhttps://bitbucket.org/medoc/recoll/wiki/ResultsThumbnails[various
+link:https://bitbucket.org/medoc/recoll/wiki/ResultsThumbnails[various
-methods for creating the thumbnails]; the ones here were made by
+methods for creating the thumbnails]; the ones here were made by opening
-opening the directory containing the PDFs in the Dolphin file manager
+the directory containing the PDFs in the Dolphin file manager (part of KDE)
-(part of KDE) and selecting the Preview option.
+and selecting the Preview option.
+And the result:
-== A search example
+image::recoll_query.png[Result list display]
-The simple query is `cerevisiae keyword:protein`. This
-returns only PDFs that have the text "cerevisiae" and have been
-tagged with the "protein" keyword. The LaTeX-style formatting from
-the BibTeX database is displayed as HTML (note the italicized words
-in article title, and umlaut in author's name). Other queries could
-be made based on the PDF metadata, e.g. 'journal:plos'
-r 'year:2013'.
-image::recoll_query.png
 == More possibilities
 - The sort buttons (up- and down-arrows) in Recoll sort the
   results by the modified date on the file at the time of indexing. If
 ...
 Note that the publication year could then be shown in
 the result list using the stored date of the file (using "%D" in the
 result paragraph format, and date format "%Y") instead of having to
 add the year to the index as shown above.
- The filter can be modified to fill in the "journal" field for
-  BibTex entries that aren't journal articles (e.g. bibtex:booktitle
+== Complete example
-  for "InCollection" entries).
+This was designed by Johannes Menzel, who kindly provided the data when we
+worked on improving PDF XMP data extraction. The originals are listed in
+this
+link:https://bitbucket.org/medoc/recoll/issues/300/extracting-xmp-metadata-and-tmsu-tags[BitBucket issue]
+The paragraph format is listed above.
+=== 'recoll.conf' additions:
+----
+pdfextrameta = bibtex:journal bibtex:journaltitle bibtex:pages \
+  bibtex:volume bibtex:number bibtex:booktitle bibtex:year bibtex:author \
+  bibtex:title bibtex:isbn bibtex:issn bibtex:editor bibtex:address \
+  bibtex:location bibtex:doi bibtex:chapter bibtex:url bibtex:entrytype \
+  bibtex:bibtexkey bibtex:abstract bibtex:date bibtex:keywords \
+  bibtex:comment bibtex:language bibtex:edition bibtex:totalpages \
+  dc:creator dc:relation dc:publisher dc:title dc:type dc:identifier
+defaultcharset = UTF-8//
+pdfextrametafix = /home/hannes/.recoll/metafix.py
+----
+=== 'metafix.py' script:
+----
+import sys
+import re
+# This can be used for local XMP field editing.
+#
+# A new instance is created for each PDF document (so the object could
+# keep state to avoid, e.g. duplicate values)
+#
+# The metafix method receives an (original) field name, and the text
+# value, and should return the possibly modified text.
+class MetaFixer(object):
+    def __init__(self):
+        pass
+    def metafix(self, nm, txt):
+        if nm == 'bibtex:pages':
+            txt = re.sub(r'--', '-', txt)
+            txt = re.sub(r'^', ', p. ', txt)
+        elif nm == 'bibtex:author':
+            txt = re.sub(r'$', ':\ ', txt)
+            pass
+        elif nm == 'bibtex:chapter':
+            txt = re.sub(r'^', ', in: id.: ', txt)
+            pass
+        elif nm == 'bibtex:editor':
+            txt = re.sub(r'^', ', in: ', txt)
+            txt = re.sub(r'$', ' (ed.):\ ', txt)
+            pass
+        elif nm == 'bibtex:year':
+            txt = re.sub(r'^', ', ', txt)
+            pass
+        elif nm == 'bibtex:date':
+            txt = re.sub(r'^', ', ', txt)
+            pass
+        elif nm == 'bibtex:volume':
+            txt = re.sub(r'^', ', vol. ', txt)
+            pass
+        elif nm == 'bibtex:number':
+            txt = re.sub(r'^', ', no. ', txt)
+            pass
+        elif nm == 'bibtex:journaltitle':
+            txt = re.sub(r'^', ', in: ', txt)
+            pass
+        elif nm == 'bibtex:journal':
+            txt = re.sub(r'^', ', in: ', txt)
+            pass
+        elif nm == 'bibtex:title':
+            txt = re.sub(r'^', '"', txt)
+            txt = re.sub(r'$', '"', txt)
+            pass
+        elif nm == 'bibtex:location':
+            txt = re.sub(r'^', ', ', txt)
+            txt = re.sub(r'$', ':\ ', txt)
+            pass
+        elif nm == 'bibtex:address':
+            txt = re.sub(r'^', ', ', txt)
+            txt = re.sub(r'$', ':\ ', txt)
+            pass
+        elif nm == 'bibtex:isbn':
+            txt = re.sub(r'^', 'ISBN: ', txt)
+            pass
+        elif nm == 'bibtex:issn':
+            txt = re.sub(r'^', 'ISSN: ', txt)
+            pass
+        elif nm == 'bibtex:doi':
+            txt = re.sub(r'^', 'DOI: ', txt)
+            pass
+        elif nm == 'bibtex:bibtexkey':
+            txt = re.sub(r'^', 'Key: ', txt)
+            pass
+        return txt
+----
+=== 'fields' file:
+----
+[prefixes]
+refjournal=RFJOURNAL
+refpages=RFPAGES
+reftitle=RFTTITLE
+refvolume=RFVOLUME
+refauthor=RFAUTHOR
+refyear=RFYYEAR
+refisbn=RFISBN
+refissn=RFISSN
+refdoi=RFDOI
+refeditor=RFEDITOR
+refpublisher=RFPUBLISHER
+refaddress=RFADDRESS
+reflocation=RFLOCATION
+refbooktitle=RFBOOKTITLE
+refurl=RFURL
+reftype=RFTYPE
+refkey=RFKEY
+refabstract=RFABSTRACT
+refkeywords=RFKEYWORDS
+refcomment=RFCOMMENT
+refedition=RFEDITION
+reflanguage=RFLANGUAGE
+[stored]
+refjournal=
+refpages=
+reftitle=
+refvolume=
+refauthor=
+refyear=
+refisbn=
+refissn=
+refdoi=
+refeditor=
+refpublisher=
+refaddress=
+reflocation=
+refbooktitle=
+refurl=
+reftype=
+refkey=
+refabstract=
+refkeywords=
+refcomment=
+refedition=
+reflanguage=
+refid=
+[aliases]
+refjournal = bibtex:journal bibtex:journaltitle
+refpages = bibtex:pages
+reftitle = bibtex:title
+refvolume = bibtex:volume
+refauthor = bibtex:author
+refyear = bibtex:year bibtex:date
+refid = dc:identifier bibtex:isbn bibtex:issn
+refisbn = bibtex:isbn
+refissn = bibtex:issn
+refdoi = bibtex:doi
+refeditor = bibtex:editor
+refpublisher = bibtex:publisher
+refaddress = bibtex:address
+reflocation = bibtex:location
+refbooktitle = bibtex:booktitle
+refurl = bibtex:url
+reftype = bibtex:entrytype bibtex:type
+refkey = bibtex:bibtexkey
+refabstract = bibtex:abstract
+refkeywords = bibtex:keywords
+refcomment = bibtex:comment
+refedition = bibtex:edition
+reflanguage = bibtex:language
+author = xesam:author
+----