Paul Lucas wrote:
> I've released SWISH++ 1.2. It's available via the SWISH-E page
> or via my software page: http://www.best.com/~pjl/software.html
> 1. SWISH++ now stores the list of stop-words in the generated
> index file so they can be ignored on searches later.
> Previosuly, using a stop-word in a query would always yield
> 0 results since the stop-word isn't in the index. After
> thinking about it, this is just plain stupid.
> Unlike SWISH-E, however, the *specific* stop-word(s) is/are
> reported, but then disregarded for the query that is *still*
> performed.
Hmmm... did I not suggest this on the list back in May, only to be pooh-poohed
by none other than yourself?
My only extra suggestion was to perform the search as above, but then
post-process the returned documents (not using the swish index, obviously)
to discard documents that do NOT contain the ignored stop-words (assuming
an AND search).
In the worst case, you'll simply get all the returned documents because
the stop-words appear in all of them. But in the non-worst case, you'll
get back a more focussed sub-set of documents that really *do* contain
all your search terms, even though one or more of them is a stop word.
In other words, the stop-words add value to the search if at all possible.
How one could actually implement this could be problematic - if the number
of documents returned from the SWISH search was small then maybe a simple
run-time grep-like search of each document for the ignored stop-words would
be manageable. But it could add significant processing time.
Alternatively, if the swish-e or swish++ developers did not wish to go
down this path, then at least indicating in the return header which words
(if any) were ignored in the query would enable a post-processing CGI
routine to do its own search of the returned documents to refine the
search results further.
My understanding is swish++ does report which words were ignored, so it
would be possible to do this kind of post-processing.
--
Dr Brendan Jones |
Honorary Associate |
Electronics Department |
Macquarie University | Email: brendan@mpce.mq.edu.au
NSW 2109 AUSTRALIA | WWW : http://www.mpce.mq.edu.au/~brendan/
Received on Mon Aug 3 21:57:52 1998