Skip to main content.
home | support | download

Back to List Archive

Re: Future versions of swish-e

From: Jason Birch <jbirch(at)not-real.psp.pair.com>
Date: Sat Jun 10 2000 - 07:29:20 GMT
On Fri, 9 Jun 2000 06:49:31 -0700 (PDT), Jose Manuel Ruiz
<jmruiz@boe.es> spoke:

> As you know, I have made several modifications to swish-e.
> I think the package is complete to satisfy my own requirements
> but it could be improved in future versions if there are enough
> people interested on it.

I like the feature list of what you have done...  I've tried out
David's Win32 compile of the code, and it's only indexing the title of
the document, not the meta tags or the actual content.  Unfortunately,
I don't have the compiler or debugging environment needed to figure
this out.
 
> 1 - Compression of Filenames and Properties. I Think that zlib
> package can be used. I have never used it before and I do not 
> know if it can fit the requirements. 

Please only do this if it has little or no impact on search
performance.  I could care less how long it takes or how much memory
is used up indexing; the critical thing is getting the data to the
user as fast as possible and with minimal load on the server.  I'm on
a shared server, and don't want my search script shut down because of
excessive usage...  Of course, if this radically decreases the size of
the index file that has to be read into memory, it may well improve
the overall performance of the search function, as only the returned
results would have to have the titles and properties uncompressed.

> 2 - More data types. swish-e only searchs for words. What about
> date an numeric data?. There is no way now to extract data
> between two dates.

Hmm. Now that's an interesting idea.  Tagging a file with a submitted
date and allowing users to search between given dates probably
wouldn't be something that I'd use, but it sure sounds like a good
idea.  Probably only useful as a meta tag type feature rather than as
an element extracted from the displayed text.

> 3 - A perl/php module/library?

I would kill for a PHP module version of Swish-e.  It would make my
life so much easier.  I would like to see it implemented in a way that
options could be set one at a time, and then the search performed.  

e.g.:

swishe_setoption("max_results",10);
swishe_setoption("offset_results",10);
swishe_setoption("index_file","/usr/home/index/test.swish");
$aryResults = swishe_search("green and frogs");
while ($aryLine = each($aryResults)) {
  print $aryLine['score']." ".$aryLine['title'];
}

Of course, this is a simplified example.  I'd also like to see a way
of accessing the metadata for a search and it's associated index file
such as the total number of hits and what properties are available in
the index.  These would only be available after a search...  that
would make it hard to specify which properties we want returned...
hmm.  The setoption() function could be replaced with dedicated
functions for each of the options, but not sure if this is worthwhile.

I don't know if it would be worth building in the indexing code, but
it would make it easier to build an auto-swish in PHP.

> 4 - A server? 

I personally wouldn't have use for it, but I'm sure that others would.

> 5 - A get document option with some type of word highlighting?

That would be interesting to see, but not necessarily as part of the
base code.  It would be cool to see in the PHP module though :)

> 6 - Better sorting of results. Now there is only a descending sort.

I couldn't really use an expanded sort order, but I'm sure that others
could...

> 7 - Better use of memory. Now, swish-e uses a lot of memory. But
> memory is cheaper every day...

I don't mind the use of lots of memory on indexing, but cleaning up
the search code would be good.  My bias is explained above...

> 8 - More work on "XML".

It would be nice to see swish-e as a good XML search engine.  I have
been using it since a really old version of Kevin Hughes original
Swish, and have become very comfortable with it.  it's a little easier
now using it via a PHP script rather than a hacked WWWWAIS, but I'd
hate to have to learn something else to handle XML searching when that
becomes an issue :)

I do have two other feature requests...

I would really like a better scoring system.  Now that we've got word
position, some compound of number of times the word appears in the
document, plus whether it's in the title/comments/whatever, plus how
close it is to the top of the document would be nice.  Are we storing
number of occurrences per file in the index anywhere?  If not, this
might be a good addition.

It would be nice to see an option for XOR in the search terms.  It's
not something that most users would understand, but it sure makes some
complex searches a lot easier.

Thanks very much for all that you have done in enhancing this
software, and for listening to my extremely selfish requests...

Jason
Received on Sat Jun 10 03:35:52 2000