Skip to main content.
home | support | download

Back to List Archive

Re: Stop words and meta tags NOTE ADDED

From: Frank Heasley <DrHeasley(at)not-real.chemistry.com>
Date: Sat Sep 16 2000 - 10:22:26 GMT
Agreed, one could index ALL stop words, but that would be extremely 
inefficient, right?

[Actually, commenting out IgnoreWords wouldn't work either.  It would just 
cause Swish to spend an inordinate amount of time calculating the frequency 
of all of the potential stop words, as defined in config.h, and then 
deleting them from the index, ending up in essentially the same 
place.  Redfining the default stop word frequency as 100% wouldn't do it 
either because it would not reliably insure that the specific words you 
need would get indexxed as they very likely would occur in 100% of the 
files.  And if you set the default frequency to 0% you end up with an empty 
index because ALL of the words would be stop words.]

It would be better to allow a few specified terms, like yes/no, etc. as 
"non" stop words.  Then you could search within meta or xml tags like a 
real database.

Suggested variable names: "SpecialWords" "Override_Stop_Words" 
"StopWordsNOT" ???

At 02:34 PM 9/15/00, you wrote:
>At 02:12 PM 09/15/00 -0700, Frank Heasley wrote:
> >Although stop words are important, there is no provision (that I'm aware
> >of) that can override them.
>
>http://sunsite.berkeley.edu/SWISH-E/Manual/config.user.html
>
>#IgnoreWords SwishDefault
># The IgnoreWords option allows you to specify words to ignore.
># Comment out for no stopwords; the word "SwishDefault" will
># include a list of default stopwords. Words should be separated by spaces
># and may span multiple directives.
>
>
>
>Bill Moseley
>mailto:moseley@hank.org
Received on Sat Sep 16 10:26:00 2000