Switch to unified view

a/src/INSTALL b/src/INSTALL
...
...
79
   After an indexing pass, the commands that were found missing can be
79
   After an indexing pass, the commands that were found missing can be
80
   displayed from the recoll File menu. The list is stored in the missing
80
   displayed from the recoll File menu. The list is stored in the missing
81
   text file inside the configuration directory.
81
   text file inside the configuration directory.
82
82
83
   A list of common file types which need external commands follows. Many of
83
   A list of common file types which need external commands follows. Many of
84
   the filters need the iconv command, which is not always listed as a
84
   the handlers need the iconv command, which is not always listed as a
85
   dependancy.
85
   dependancy.
86
86
87
   Please note that, due to the relatively dynamic nature of this
87
   Please note that, due to the relatively dynamic nature of this
88
   information, the most up to date version is now kept on
88
   information, the most up to date version is now kept on
89
   http://www.recoll.org/features.html along with links to the home pages or
89
   http://www.recoll.org/features.html along with links to the home pages or
...
...
94
   from the package repositories. However, the packages are sometimes
94
   from the package repositories. However, the packages are sometimes
95
   outdated, or not the best version for Recoll, so you should take a look at
95
   outdated, or not the best version for Recoll, so you should take a look at
96
   http://www.recoll.org/features.html if a file type is important to you.
96
   http://www.recoll.org/features.html if a file type is important to you.
97
97
98
   As of Recoll release 1.14, a number of XML-based formats that were handled
98
   As of Recoll release 1.14, a number of XML-based formats that were handled
99
   by ad hoc filter code now use the xsltproc command, which usually comes
99
   by ad hoc handler code now use the xsltproc command, which usually comes
100
   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
100
   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
101
101
102
   Now for the list:
102
   Now for the list:
103
103
104
     o Openoffice files need unzip and xsltproc.
104
     o Openoffice files need unzip and xsltproc.
...
...
112
112
113
     o MS Word needs antiword. It is also useful to have wvWare installed as
113
     o MS Word needs antiword. It is also useful to have wvWare installed as
114
       it may be be used as a fallback for some files which antiword does not
114
       it may be be used as a fallback for some files which antiword does not
115
       handle.
115
       handle.
116
116
117
     o MS Excel and PowerPoint need catdoc.
117
     o MS Excel and PowerPoint are processed by internal Python handlers.
118
118
119
     o MS Open XML (docx) needs xsltproc.
119
     o MS Open XML (docx) needs xsltproc.
120
120
121
     o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
121
     o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
122
       Ubuntu) package.
122
       Ubuntu) package.
...
...
131
131
132
     o dvi files need dvips.
132
     o dvi files need dvips.
133
133
134
     o djvu files need djvutxt and djvused from the DjVuLibre package.
134
     o djvu files need djvutxt and djvused from the DjVuLibre package.
135
135
136
     o Audio files: Recoll releases before 1.13 used the id3info command from
136
     o Audio files: Recoll releases 1.14 and later use a single Python
137
       the id3lib package to extract mp3 tag information, metaflac (standard
137
       handler based on mutagen for all audio file types.
138
       flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
139
       Releases 1.14 and later use a single Python filter based on mutagen
140
       for all audio file types.
141
138
142
     o Pictures: Recoll uses the Exiftool Perl package to extract tag
139
     o Pictures: Recoll uses the Exiftool Perl package to extract tag
143
       information. Most image file formats are supported. Note that there
140
       information. Most image file formats are supported. Note that there
144
       may not be much interest in indexing the technical tags (image size,
141
       may not be much interest in indexing the technical tags (image size,
145
       aperture, etc.). This is only of interest if you store personal tags
142
       aperture, etc.). This is only of interest if you store personal tags
146
       or textual descriptions inside the image files.
143
       or textual descriptions inside the image files.
147
144
148
     o chm: files in microsoft help format need Python and the pychm module
145
     o chm: files in Microsoft help format need Python and the pychm module
149
       (which needs chmlib).
146
       (which needs chmlib).
150
147
151
     o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
148
     o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
152
       module. icalendar is not needed for newer versions, which use internal
149
       module. icalendar is not needed for newer versions, which use internal
153
       code.
150
       code.
...
...
159
156
160
     o Midi karaoke files need Python and the Midi module
157
     o Midi karaoke files need Python and the Midi module
161
158
162
     o Konqueror webarchive format with Python (uses the Tarfile module).
159
     o Konqueror webarchive format with Python (uses the Tarfile module).
163
160
164
     o mimehtml web archive format (support based on the email filter, which
161
     o Mimehtml web archive format (support based on the email handler, which
165
       introduces some mild weirdness, but still usable).
162
       introduces some mild weirdness, but still usable).
166
163
167
   Text, HTML, email folders, and Scribus files are processed internally. Lyx
164
   Text, HTML, email folders, and Scribus files are processed internally. Lyx
168
   is used to index Lyx files. Many filters need iconv and the standard sed
165
   is used to index Lyx files. Many handlers need iconv and the standard sed
169
   and awk.
166
   and awk.
170
167
171
     ----------------------------------------------------------------------
168
     ----------------------------------------------------------------------
172
169
173
   Prev                                        Up                        Next 
170
   Prev                                        Up                        Next 
...
...
513
510
514
   zipSkippedNames
511
   zipSkippedNames
515
512
516
           A space-separated list of patterns for names of files or
513
           A space-separated list of patterns for names of files or
517
           directories that should be ignored inside zip archives. This is
514
           directories that should be ignored inside zip archives. This is
518
           used directly by the zip filter, and has a function similar to
515
           used directly by the zip handler, and has a function similar to
519
           skippedNames, but works independantly. Can be redefined for
516
           skippedNames, but works independantly. Can be redefined for
520
           filesystem subdirectories. For versions up to 1.19, you will need
517
           filesystem subdirectories. For versions up to 1.19, you will need
521
           to update the Zip filter and install a supplementary Python
518
           to update the Zip handler and install a supplementary Python
522
           module. The details are described on the Recoll wiki.
519
           module. The details are described on the Recoll wiki.
523
520
524
   followLinks
521
   followLinks
525
522
526
           Specifies if the indexer should follow symbolic links while
523
           Specifies if the indexer should follow symbolic links while
...
...
531
           sections. It can not be changed below the topdirs level.
528
           sections. It can not be changed below the topdirs level.
532
529
533
   indexedmimetypes
530
   indexedmimetypes
534
531
535
           Recoll normally indexes any file which it knows how to read. This
532
           Recoll normally indexes any file which it knows how to read. This
536
           list lets you restrict the indexed mime types to what you specify.
533
           list lets you restrict the indexed MIME types to what you specify.
537
           If the variable is unspecified or the list empty (the default),
534
           If the variable is unspecified or the list empty (the default),
538
           all supported types are processed. Can be redefined for
535
           all supported types are processed. Can be redefined for
539
           subdirectories.
536
           subdirectories.
537
538
   excludedmimetypes
539
540
           This list lets you exclude some MIME types from indexing. Can be
541
           redefined for subdirectories.
540
542
541
   compressedfilemaxkbs
543
   compressedfilemaxkbs
542
544
543
           Size limit for compressed (.gz or .bz2) files. These need to be
545
           Size limit for compressed (.gz or .bz2) files. These need to be
544
           decompressed in a temporary directory for identification, which
546
           decompressed in a temporary directory for identification, which
...
...
568
   indexallfilenames
570
   indexallfilenames
569
571
570
           Recoll indexes file names in a special section of the database to
572
           Recoll indexes file names in a special section of the database to
571
           allow specific file names searches using wild cards. This
573
           allow specific file names searches using wild cards. This
572
           parameter decides if file name indexing is performed only for
574
           parameter decides if file name indexing is performed only for
573
           files with mime types that would qualify them for full text
575
           files with MIME types that would qualify them for full text
574
           indexing, or for all files inside the selected subtrees,
576
           indexing, or for all files inside the selected subtrees,
575
           independently of mime type.
577
           independently of MIME type.
576
578
577
   usesystemfilecommand
579
   usesystemfilecommand
578
580
579
           Decide if we use the file -i system command as a final step for
581
           Decide if we use the file -i system command as a final step for
580
           determining the mime type for a file (the main procedure uses
582
           determining the MIME type for a file (the main procedure uses
581
           suffix associations as defined in the mimemap file). This can be
583
           suffix associations as defined in the mimemap file). This can be
582
           useful for files with suffix-less names, but it will also cause
584
           useful for files with suffix-less names, but it will also cause
583
           the indexing of many bogus "text" files.
585
           the indexing of many bogus "text" files.
584
586
585
   processwebqueue
587
   processwebqueue
...
...
788
790
789
   webcachemaxmbs
791
   webcachemaxmbs
790
792
791
           This is only used by the web browser plugin indexing code, and
793
           This is only used by the web browser plugin indexing code, and
792
           defines the maximum size for the web page cache. Default: 40 MB.
794
           defines the maximum size for the web page cache. Default: 40 MB.
795
           Quite unfortunately, this is only taken into account when creating
796
           the cache file. You need to delete the file for a change to be
797
           taken into account.
793
798
794
   idxflushmb
799
   idxflushmb
795
800
796
           Threshold (megabytes of new text data) where we flush from memory
801
           Threshold (megabytes of new text data) where we flush from memory
797
           to disk index. Setting this can help control memory usage. A value
802
           to disk index. Setting this can help control memory usage. A value
...
...
927
           These allow defining the ionice class and data used by the indexer
932
           These allow defining the ionice class and data used by the indexer
928
           (default class 3, no data).
933
           (default class 3, no data).
929
934
930
   filtermaxseconds
935
   filtermaxseconds
931
936
932
           Maximum filter execution time, after which it is aborted. Some
937
           Maximum handler execution time, after which it is aborted. Some
933
           postscript programs just loop...
938
           postscript programs just loop...
934
939
935
   filtersdir
940
   filtersdir
936
941
937
           A directory to search for the external filter scripts used to
942
           A directory to search for the external input handler scripts used
938
           index some types of files. The value should not be changed, except
943
           to index some types of files. The value should not be changed,
939
           if you want to modify one of the default scripts. The value can be
944
           except if you want to modify one of the default scripts. The value
940
           redefined for any sub-directory.
945
           can be redefined for any sub-directory.
941
946
942
   iconsdir
947
   iconsdir
943
948
944
           The name of the directory where recoll result list icons are
949
           The name of the directory where recoll result list icons are
945
           stored. You can change this if you want different images.
950
           stored. You can change this if you want different images.
...
...
1016
   [aliases]
1021
   [aliases]
1017
1022
1018
           This section defines lists of synonyms for the canonical names
1023
           This section defines lists of synonyms for the canonical names
1019
           used inside the [prefixes] and [stored] sections
1024
           used inside the [prefixes] and [stored] sections
1020
1025
1021
   filter-specific sections
1026
   handler-specific sections
1022
1027
1023
           Some filters may need specific configuration for handling fields.
1028
           Some input handlers may need specific configuration for handling
1024
           Only the email message filter currently has such a section (named
1029
           fields. Only the email message handler currently has such a
1025
           [mail]). It allows indexing arbitrary email headers in addition to
1030
           section (named [mail]). It allows indexing arbitrary email headers
1026
           the ones indexed by default. Other such sections may appear in the
1031
           in addition to the ones indexed by default. Other such sections
1027
           future.
1032
           may appear in the future.
1028
1033
1029
   Here follows a small example of a personal fields file. This would extract
1034
   Here follows a small example of a personal fields file. This would extract
1030
   a specific email header and use it as a searchable field, with data
1035
   a specific email header and use it as a searchable field, with data
1031
   displayable inside result lists. (Side note: as the email filter does no
1036
   displayable inside result lists. (Side note: as the email handler does no
1032
   decoding on the values, only plain ascii headers can be indexed, and only
1037
   decoding on the values, only plain ascii headers can be indexed, and only
1033
   the first occurrence will be used for headers that occur several times).
1038
   the first occurrence will be used for headers that occur several times).
1034
1039
1035
 [prefixes]
1040
 [prefixes]
1036
 # Index mailmytag contents (with the given prefix)
1041
 # Index mailmytag contents (with the given prefix)
...
...
1058
   translations from extended attributes names to Recoll field names. An
1063
   translations from extended attributes names to Recoll field names. An
1059
   empty translation disables use of the corresponding attribute data.
1064
   empty translation disables use of the corresponding attribute data.
1060
1065
1061
  5.4.3. The mimemap file
1066
  5.4.3. The mimemap file
1062
1067
1063
   mimemap specifies the file name extension to mime type mappings.
1068
   mimemap specifies the file name extension to MIME type mappings.
1064
1069
1065
   For file names without an extension, or with an unknown one, the system's
1070
   For file names without an extension, or with an unknown one, the system's
1066
   file -i command will be executed to determine the mime type (this can be
1071
   file -i command will be executed to determine the MIME type (this can be
1067
   switched off inside the main configuration file).
1072
   switched off inside the main configuration file).
1068
1073
1069
   The mappings can be specified on a per-subtree basis, which may be useful
1074
   The mappings can be specified on a per-subtree basis, which may be useful
1070
   in some cases. Example: gaim logs have a .txt extension but should be
1075
   in some cases. Example: gaim logs have a .txt extension but should be
1071
   handled specially, which is possible because they are usually all located
1076
   handled specially, which is possible because they are usually all located
...
...
1082
   given Recoll version. Having it there avoids cluttering the more
1087
   given Recoll version. Having it there avoids cluttering the more
1083
   user-oriented and locally customized skippedNames.
1088
   user-oriented and locally customized skippedNames.
1084
1089
1085
  5.4.4. The mimeconf file
1090
  5.4.4. The mimeconf file
1086
1091
1087
   mimeconf specifies how the different mime types are handled for indexing,
1092
   mimeconf specifies how the different MIME types are handled for indexing,
1088
   and which icons are displayed in the recoll result lists.
1093
   and which icons are displayed in the recoll result lists.
1089
1094
1090
   Changing the parameters in the [index] section is probably not a good idea
1095
   Changing the parameters in the [index] section is probably not a good idea
1091
   except if you are a Recoll developer.
1096
   except if you are a Recoll developer.
1092
1097
...
...
1106
1111
1107
   If Use desktop preferences to choose document editor is checked in the
1112
   If Use desktop preferences to choose document editor is checked in the
1108
   Recoll GUI preferences, all mimeview entries will be ignored except the
1113
   Recoll GUI preferences, all mimeview entries will be ignored except the
1109
   one labelled application/x-all (which is set to use xdg-open by default).
1114
   one labelled application/x-all (which is set to use xdg-open by default).
1110
1115
1111
   In this case, the xallexcepts top level variable defines a list of mime
1116
   In this case, the xallexcepts top level variable defines a list of MIME
1112
   type exceptions which will be processed according to the local entries
1117
   type exceptions which will be processed according to the local entries
1113
   instead of being passed to the desktop. This is so that specific Recoll
1118
   instead of being passed to the desktop. This is so that specific Recoll
1114
   options such as a page number or a search string can be passed to
1119
   options such as a page number or a search string can be passed to
1115
   applications that support them, such as the evince viewer.
1120
   applications that support them, such as the evince viewer.
1116
1121
...
...
1119
   non-default entries, which will override those from the central
1124
   non-default entries, which will override those from the central
1120
   configuration file.
1125
   configuration file.
1121
1126
1122
   All viewer definition entries must be placed under a [view] section.
1127
   All viewer definition entries must be placed under a [view] section.
1123
1128
1124
   The keys in the file are normally mime types. You can add an application
1129
   The keys in the file are normally MIME types. You can add an application
1125
   tag to specialize the choice for an area of the filesystem (using a
1130
   tag to specialize the choice for an area of the filesystem (using a
1126
   localfields specification in mimeconf). The syntax for the key is
1131
   localfields specification in mimeconf). The syntax for the key is
1127
   mimetype|tag
1132
   mimetype|tag
1128
1133
1129
   The nouncompforviewmts entry, (placed at the top level, outside of the
1134
   The nouncompforviewmts entry, (placed at the top level, outside of the
1130
   [view] section), holds a list of mime types that should not be
1135
   [view] section), holds a list of MIME types that should not be
1131
   uncompressed before starting the viewer (if they are found compressed, ie:
1136
   uncompressed before starting the viewer (if they are found compressed, ie:
1132
   mydoc.doc.gz).
1137
   mydoc.doc.gz).
1133
1138
1134
   The right side of each assignment holds a command to be executed for
1139
   The right side of each assignment holds a command to be executed for
1135
   opening the file. The following substitutions are performed:
1140
   opening the file. The following substitutions are performed:
...
...
1145
     o %i. Internal path, for subdocuments of containers. The format depends
1150
     o %i. Internal path, for subdocuments of containers. The format depends
1146
       on the container type. If this appears in the command line, Recoll
1151
       on the container type. If this appears in the command line, Recoll
1147
       will not create a temporary file to extract the subdocument, expecting
1152
       will not create a temporary file to extract the subdocument, expecting
1148
       the called application (possibly a script) to be able to handle it.
1153
       the called application (possibly a script) to be able to handle it.
1149
1154
1150
     o %M. Mime type
1155
     o %M. MIME type
1151
1156
1152
     o %p. Page index. Only significant for a subset of document types,
1157
     o %p. Page index. Only significant for a subset of document types,
1153
       currently only PDF, Postscript and DVI files. Can be used to start the
1158
       currently only PDF, Postscript and DVI files. Can be used to start the
1154
       editor at the right page for a match or snippet.
1159
       editor at the right page for a match or snippet.
1155
1160
...
...
1198
     o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
1203
     o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
1199
       following line:
1204
       following line:
1200
1205
1201
 .blob = application/x-blobapp
1206
 .blob = application/x-blobapp
1202
1207
1203
       Note that the mime type is made up here, and you could call it
1208
       Note that the MIME type is made up here, and you could call it
1204
       diesel/oil just the same.
1209
       diesel/oil just the same.
1205
1210
1206
     o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
1211
     o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
1207
1212
1208
 application/x-blobapp = blobviewer %f
1213
 application/x-blobapp = blobviewer %f
1209
1214
1210
       We are supposing that blobviewer wants a file name parameter here, you
1215
       We are supposing that blobviewer wants a file name parameter here, you
1211
       would use %u if it liked URLs better.
1216
       would use %u if it liked URLs better.
1212
1217
1213
   If you just wanted to change the application used by Recoll to display a
1218
   If you just wanted to change the application used by Recoll to display a
1214
   mime type which it already knows, you would just need to edit mimeview.
1219
   MIME type which it already knows, you would just need to edit mimeview.
1215
   The entries you add in your personal file override those in the central
1220
   The entries you add in your personal file override those in the central
1216
   configuration, which you do not need to alter. mimeview can also be
1221
   configuration, which you do not need to alter. mimeview can also be
1217
   modified from the Gui.
1222
   modified from the Gui.
1218
1223
1219
    5.4.7.2. Adding indexing support for a new file type
1224
    5.4.7.2. Adding indexing support for a new file type
...
...
1231
1236
1232
     o Under the [icons] section, you should choose an icon to be displayed
1237
     o Under the [icons] section, you should choose an icon to be displayed
1233
       for the files inside the result lists. Icons are normally 64x64 pixels
1238
       for the files inside the result lists. Icons are normally 64x64 pixels
1234
       PNG files which live in /usr/[local/]share/recoll/images.
1239
       PNG files which live in /usr/[local/]share/recoll/images.
1235
1240
1236
     o Under the [categories] section, you should add the mime type where it
1241
     o Under the [categories] section, you should add the MIME type where it
1237
       makes sense (you can also create a category). Categories may be used
1242
       makes sense (you can also create a category). Categories may be used
1238
       for filtering in advanced search.
1243
       for filtering in advanced search.
1239
1244
1240
   The rclblob filter should be an executable program or script which exists
1245
   The rclblob handler should be an executable program or script which exists
1241
   inside /usr/[local/]share/recoll/filters. It will be given a file name as
1246
   inside /usr/[local/]share/recoll/filters. It will be given a file name as
1242
   argument and should output the text or html contents on the standard
1247
   argument and should output the text or html contents on the standard
1243
   output.
1248
   output.
1244
1249
1245
   The filter programming section describes in more detail how to write a
1250
   The filter programming section describes in more detail how to write an
1246
   filter.
1251
   input handler.
1247
1252
1248
     ----------------------------------------------------------------------
1253
     ----------------------------------------------------------------------
1249
1254
1250
   Prev                                Up                                     
1255
   Prev                                Up                                     
1251
   5.3. Building from source          Home                                    
1256
   5.3. Building from source          Home