Skip to main content.
home | support | download

Back to List Archive

Re: Stop words and meta tags NOTE ADDED

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Sep 16 2000 - 14:21:31 GMT
At 03:20 AM 09/16/00 -0700, you wrote:
>Agreed, one could index ALL stop words, but that would be extremely 
>inefficient, right?

Stop words are words that aren't indexed.  So if you index stop words such
as "yes" and "no" then they are not stop words anymore.

>[Actually, commenting out IgnoreWords wouldn't work either.  It would just 
>cause Swish to spend an inordinate amount of time calculating the frequency 
>of all of the potential stop words, as defined in config.h, and then 
>deleting them from the index, ending up in essentially the same 
>place.

Don't use IgnoreLimit
http://sunsite.berkeley.edu/SWISH-E/archive/1848.html


>It would be better to allow a few specified terms, like yes/no, etc. as 
>"non" stop words.  Then you could search within meta or xml tags like a 
>real database.

Sounds like you are confused about stop words.  There are two kinds of
words: words in the index, and words that aren't in the index.  You can't
say "yes" and "no" are stop words, but include them in the index.  They are
no longer stop words if you do that.  

Again, use IgnoreWords and specify your stop words:
IgnoreWords a an the and

Now you can search for "yes" and "no", but not "a" "an" "the" and "and".

>Suggested variable names: "SpecialWords" "Override_Stop_Words" 
>"StopWordsNOT" ???
>
>At 02:34 PM 9/15/00, you wrote:
>>At 02:12 PM 09/15/00 -0700, Frank Heasley wrote:
>> >Although stop words are important, there is no provision (that I'm aware
>> >of) that can override them.
>>
>>http://sunsite.berkeley.edu/SWISH-E/Manual/config.user.html
>>
>>#IgnoreWords SwishDefault
>># The IgnoreWords option allows you to specify words to ignore.
>># Comment out for no stopwords; the word "SwishDefault" will
>># include a list of default stopwords. Words should be separated by spaces
>># and may span multiple directives.
>>
>>
>>
>>Bill Moseley
>>mailto:moseley@hank.org
>
>
>

Bill Moseley
mailto:moseley@hank.org
Received on Sat Sep 16 14:24:50 2000