Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:353] SWISH++ 1.2 released

From: Brendan Jones <brendan(at)not-real.mpce.mq.edu.au>
Date: Tue Aug 04 1998 - 04:47:36 GMT
Paul Lucas wrote:

> 	I've released SWISH++ 1.2.  It's available via the SWISH-E page
> 	or via my software page: http://www.best.com/~pjl/software.html
> 	1. SWISH++ now stores the list of stop-words in the generated
> 	   index file so they can be ignored on searches later.
> 	   Previosuly, using a stop-word in a query would always yield
> 	   0 results since the stop-word isn't in the index.  After
> 	   thinking about it, this is just plain stupid.
> 	   Unlike SWISH-E, however, the *specific* stop-word(s) is/are
> 	   reported, but then disregarded for the query that is *still*
> 	   performed.

Hmmm... did I not suggest this on the list back in May, only to be pooh-poohed
by none other than yourself?

My only extra suggestion was to perform the search as above, but then
post-process the returned documents (not using the swish index, obviously)
to discard documents that do NOT contain the ignored stop-words (assuming
an AND search).

In the worst case, you'll simply get all the returned documents because
the stop-words appear in all of them.  But in the non-worst case, you'll
get back a more focussed sub-set of documents that really *do* contain
all your search terms, even though one or more of them is a stop word.
In other words, the stop-words add value to the search if at all possible.

How one could actually implement this could be problematic - if the number
of documents returned from the SWISH search was small then maybe a simple
run-time grep-like search of each document for the ignored stop-words would
be manageable.  But it could add significant processing time.

Alternatively, if the swish-e or swish++ developers did not wish to go
down this path, then at least indicating in the return header which words
(if any) were ignored in the query would enable a post-processing CGI
routine to do its own search of the returned documents to refine the
search results further.

My understanding is swish++ does report which words were ignored, so it
would be possible to do this kind of post-processing.

-- 
Dr Brendan Jones        |
Honorary Associate      |
Electronics Department  |
Macquarie University    | Email: brendan@mpce.mq.edu.au
NSW 2109  AUSTRALIA     | WWW  : http://www.mpce.mq.edu.au/~brendan/
Received on Mon Aug 3 21:57:52 1998