On Wed, Mar 10, 2004 at 03:03:19PM -0600, Peter Ensch wrote:
> 1) I expected to capture 'this_word' from the path
> /path/to/my/site/this_word/file.htm
> but underscores are not included in the default
> WordCharacters. The string was not captured or was
> truncated.
You could use BuzzWords, but that would effect all metanames.
> 2) If stemming is turned on, this also affects what
> ExtractPath captures. I expected to capture
> 'relnotes' from the path
> /path/to/my/site/relnotes/file.htm
> but got 'relnot' instead.
Another problem. Would be nice to set the fuzzy mode on a per-metaname
basis.
> For my purposes it would have been better if ExtractPath
> stored literal text (not SWISH 'words'), but perhaps in
> most cases this is what people want.
That happens at a higher up level -- as far as the indexing code knows
you are just adding a <meta name="links" content="this_word">.
It's just code -- so a hack is always possible. In index.c look at
"index_path_parts" -- you could likely turn of stemming around the
indexstring() call. The fuzzy_mode is in:
sw->indexf->header.fuzzy_data->stemmer->fuzzy_mode
so save that locally, then set it to FUZZY_NONE, call indexstring() and
then reset.
That's just off the top of my head. No promises that it works...
--
Bill Moseley
moseley@hank.org
Received on Wed Mar 10 13:23:33 2004