Switch to unified view

a/website/features.html b/website/features.html
...
...
57
                <li><var class="literal">text</var>.</li>
57
                <li><var class="literal">text</var>.</li>
58
58
59
                <li><var class="literal">html</var>.</li>
59
                <li><var class="literal">html</var>.</li>
60
60
61
                <li><span class="application">OpenOffice</span>
61
                <li><span class="application">OpenOffice</span>
62
                files.</li>
62
                files (needs <b>unzip</b> command).</li>
63
63
64
                <li><var class="literal">maildir</var> and <var
64
                <li><var class="literal">maildir</var> and <var
65
            class="literal">mailbox</var> (<span class=
65
            class="literal">mailbox</var> (<span class=
66
            "application">Mozilla</span>, <span class=
66
            "application">Mozilla</span>, <span class=
67
            "application">Thunderbird</span> and <span class=
67
            "application">Thunderbird</span> and <span class=
...
...
120
        <li>Specific file name searches with wildcards.</li>
120
        <li>Specific file name searches with wildcards.</li>
121
121
122
        <li>Support for multiple charsets. Internal processing and
122
        <li>Support for multiple charsets. Internal processing and
123
          storage uses Unicode UTF-8.</li>
123
          storage uses Unicode UTF-8.</li>
124
124
125
      <li>Stemming performed at query time (can switch stemming
125
      <li><a href="#Stemming">Stemming</a> performed at query
126
        language after indexing).</li>
126
        time (can switch stemming language after indexing).</li>
127
127
128
        <li>Easy installation. No database daemon, web server or
128
        <li>Easy installation. No database daemon, web server or
129
          exotic language necessary.</li>
129
          exotic language necessary.</li>
130
130
131
        <li>An indexer which runs either as a thread inside the GUI
131
        <li>An indexer which runs either as a thread inside the GUI
132
          or as an external, cron'able program.</li>
132
          or as an external, cron'able program.</li>
133
      </ul>
133
      </ul>
134
    </dd>
134
    </dd>
135
      </ul>
135
      </ul>
136
136
137
      <h2><a name="#stemming"></a>Stemming</h2>
137
138
139
      <p>Stemming is a process which transforms inflected words into
140
      their most basic form. For exemple, <i>flooring</i>,
141
      <i>floors</i>, <i>floored</i> would probably all be transformed
142
      to <i>floor</i> by a stemmer for the English language.</p>
143
144
      <p>In many search engines, the stemming process occurs during
145
      indexing. The index will only contain the stemmed form of words,
146
      with exceptions for terms which are detected as being probably
147
      proper nouns (ie: capitalized). At query time, the terms entered
148
      by the user are stemmed, then matched against the index.</p>
149
150
      <p>This process results into a smaller index, but it has the
151
  grave inconvenient of irrevocably losing information during
152
  indexing.</p>
153
154
      <p>Recoll works in a different way. No stemming is performed at
155
  query time, so that all information gets into the index. The
156
  resulting index is bigger, but most people probably don't care
157
  much about this nowadays, because they have a 100Gb disk 95%
158
  full of binary data <em>which does not get indexed</em>.</p>
159
      <p>At the end of an indexing pass, Recoll builds one or several
160
  stemming dictionaries, where all word stems are listed in
161
  correspondence to the list of their derivatives.</p>
162
163
      <p>At query time, by default, user-entered terms are stemmed,
164
  then matched against the stem database, and the query is
165
  expanded to include all derivatives. This will yield search
166
  results analogous to those obtained by a classical engine.
167
  The benefits of this approach is that stem expansion can be
168
  controlled instantly at query time in several ways:
169
  <ul>
170
  <li>It can be selectively turned-off for any query term by
171
    capitalizing it (<i>Floor</i>).</li>
172
  <li>The stemming language (ie: english, french...) can be
173
    selected (this supposes that several stemming databases have
174
    been built, which can be configured as part of the indexing,
175
    or done later, in a reasonably fast way).</li>
176
      </ul>
177
  
138
    </div>
178
    </div>
139
  </body>
179
  </body>
140
</html>
180
</html>
141
181