On Wed, Mar 10, 2004 at 11:08:54AM -0800, Bill Moseley wrote:
> On Wed, Mar 10, 2004 at 12:34:23PM -0600, Peter Ensch wrote:
>
> > I am concerned w/ the output however. The word names appear
> > truncated. I guess they are printf'ed for formatting purposes,
> > but this does limit their usefulness. Is there a way to
> > prevent this w/out hacking on the source code?
>
> I don't see that behavior. The code has:
>
> printf("\n%s",resultword);
>
> So that doesn't limit the length.
>
Of course you are correct.
I did discover the problem. Words captured as a result of an
ExtractPath regex have the same rules applied as any other
indexed word.
In my case there were 2 unexpected factors at play:
1) I expected to capture 'this_word' from the path
/path/to/my/site/this_word/file.htm
but underscores are not included in the default
WordCharacters. The string was not captured or was
truncated.
2) If stemming is turned on, this also affects what
ExtractPath captures. I expected to capture
'relnotes' from the path
/path/to/my/site/relnotes/file.htm
but got 'relnot' instead.
For my purposes it would have been better if ExtractPath
stored literal text (not SWISH 'words'), but perhaps in
most cases this is what people want.
Regards,
Peter
--
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^
Peter Ensch,
pensch@ti.com A-1140 (214) 480 2333
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^
Received on Wed Mar 10 13:04:28 2004