Hi Bill,
On 22 Sep 2000, at 6:19, Bill Moseley wrote:
> At 01:20 AM 09/22/00 -0700, you wrote:
> >> Would that be very hard to implement?
> >>
> >Not really. I was thinking in some config directive like:
> >BumpPositionCounterCharacters |-()
>
> What about multiple characters or some way of saying bump on a period, but
> only if it's the end of a sentence? So,
>
> "It was expensive. The price was $5.24 at the local store."
> ^^ ^ ^
> Bump No bump Bump
>
> Maybe BumpPositionCounterCharacters would need to be a subset of
> IgnoreLastChar?
>
There is another posibility:
"It was expensive . The price"
Here, there is a blank before and after the period. So we have at
least 4 posibilities:
1- "word. word"
2- "word.word"
3- "word . word"
4- "word .word"
And what about these ones
5- "word... word"
I am not sure if BumpPositionCounterCharacters needs to be a
subset of IgnoreLastChar. I do not think so. But the check in the code
has to consider that it can be part of IgnoreLastChar.
Basically, in stripIgnoreLastChar function, if a char is stripped, we
have to check also if the stripped character is in
BumpPositionCounterCharacters. If so, counter is incremented.
Adtionally, we have to increment pointer if the next non blank char
after a word is in BumpPositionCounterCharacters.
What do you think?
> >
> >I have also been working in your lasts posts
> >- Get StopWords (SwishStopWords function)
>
> Anyway to get a switch to get the stopwords printed in the headers of the
> swish binary?
>
Sure
cu
Jose
Received on Fri Sep 22 14:27:03 2000