Skip to main content.
home | support | download

Back to List Archive

Re: Win32 PHRASE search

From: Jose Manuel Ruiz <jmruiz(at)not-real.boe.es>
Date: Mon Apr 24 2000 - 17:45:22 GMT
Bill Moseley wrote:
> 
> Hi Jose,

Hi Bill,
> 
> I've commented on these things before:
> 
> Will you be looking at the word position counting?  I really think that
> word positions should be based only on words as defined by swish only, and
> not bumping the position counter for ending periods or any other
> punctuation, or even stop words.
> 

Well, this is the easiest way... I do not have to worry about anything
but an OK word... I will fix it in next release.

More about stop words...

In config.h you can find the following line:

#define IGNORE_STOPWORDS_IN_QUERY 1

If enabled, this means that stop words are ignored in the query. In
fact, they are removed from the query. So, to work, position counter
may not be bumped in index.c. But, if we do not set 
IGNORE_STOPWORDS_IN_QUERY and the position counter is not bumped 
when a stop word is found while indexing, then the phrase cannot be 
found in a search because stop words are not removed from the query. 

So, I am wondering if IGNORE_STOPWORDS_IN_QUERY has any sense now.
It always has to be enabled!! 

I think the solution is:

- IGNORE_STOPWORDS_IN_QUERY must always be enabled!!
- Position counter may not be bumped when stop words are found

For periods or any other punctuation the position will not be bumped
in next release. In this way, for example, looking for:

search -w 'metaname="Berkeley University"' -f index.file

will find:

Berkeley University  (OK)

Berkeley, University (OK)

Berkeley. University (OK?)

Berkeley.
University (OK?)

> Plus, I really think that swish should parse text on searching exactly like
> it does on indexing.  Otherwise, it is very confusing as you can't search
> for text cut directly from the source document and expect it to work.  That
> means the wordchars, ignore first and last, and other settings would need
> to be saved in the index file (just like the Use Stemming: setting).
>

Yes, it should work that way. But this can be a major change. Let me
look at
the code... There are other things that may be also included in the
index file. 

> I don't think that phrases should span meta fields, either.  It seemed like
> I could search for "two words" and if "two" was the last word in one meta
> field, and "words" was the first word in the next field it would find a
> match.  That shouldn't be like that.
> 

Yes, you are right. In function parseMetaData the position
is always set to 1 each time it is called. Perhaps there is a bug.
Anyway, I 
will check it.


-- 

Jose Manuel Ruiz Ramos

jmruiz@boe.es
Received on Mon Apr 24 13:47:01 2000