Skip to main content.
home | support | download

Back to List Archive

PHRASE Search (resend)

From: Jose Manuel Ruiz <jmruiz(at)not-real.boe.es>
Date: Mon Apr 17 2000 - 08:16:40 GMT
Sorry, I mispelled the address of swish-e list and this 
message has been waiting here from friday.

So here it is

Bill,

> 
> BTW -- one time when I was watching swish search the index it looked like
> if you searched on, say, Beer, it would quickly locate the start of the "B"
> words and then sequentially search through those words for Beer.  Is that
> the way it was searching or am I mistaken?  Seemed like a bad search method.
> 

You are right. That is the way it works. This is very useful to search
for "b*".
In this way it is easy to expand the "*" character in a list of "or". 
There is also a function (getfileinfo in search.c) which needs to read
all the words until find the info of the word you are looking for. This
is slow when you have to search for many words and your file index is 
big. Again look at "b*", this is expanded to "back or or bad bat or ...
or beer ...", then getfileinfo is invoked for each word in order to
resolve the query. Let us say that you have 500 words starting with "b",
then you have to execute getfileinfo 500 times and reread the
indexfile 500 times. As you can imagine, reading the last "b*" word
means reading all "b*" previous words.

Well, that was the way it worked. Now, in PHRASE SEARCH, I have added a
hash index to the file index. "b*" is resolved in the same way as it was
but getfileinfo looks for the info using a hash search, saving file i/o.
Look at hash.c. You can change SEARCHHASHSIZE in swish.h to 
increase/decrease the size of the word hash table.

Have a nice day

Jose Manuel Ruiz Ramos

jmruiz@boe.es
Received on Mon Apr 17 04:19:00 2000