Skip to main content.
home | support | download

Back to List Archive

RE: possible problems with Swish-e searching instructio

From: Rossmann, Doralyn <doralyn(at)not-real.montana.edu>
Date: Fri Nov 07 2003 - 21:36:55 GMT
Ok...here's what I'm reading.

Find stuff with Juliet---let's say that's 100 items

Subtract stuff with Ophelia--let's say that's 25 items.  So now you've got
75 items.

Combine that result with those pages that also contain the word Pac.  Let's
say you had 200 items with Pac but only 50 of those also have Juliet in
them.  --let's say all of the Pac and (Juliet minus Ophelia) stuff is 40
items. 

Ok...well, what if the Pac set had stuff with Ophelia?  You didn't ever
"subtract" Ophelia from Pac.  So you have Juliet stuff without Ophelia and
that result's been combined with Pac stuff.

Do you see what I'm saying?  You aren't ever subtracting Ophelia from Pac
and the documentation isn't addressing that.  

Doralyn

-----Original Message-----
From: moseley@hank.org [mailto:moseley@hank.org]
Sent: Friday, November 07, 2003 2:21 PM
To: Rossmann, Doralyn
Cc: Multiple recipients of list
Subject: Re: [SWISH-E] possible problems with Swish-e searching
instructions


Oh great.  The day I didn't get any coffee and now I have to think about 
boolean logic.

On Fri, Nov 07, 2003 at 08:22:07AM -0800, Rossmann, Doralyn wrote:

> These examples are both incorrect.  The search
> swish -w "juliet not ophelia and pac" -f myIndex
> 
> would retrieve files with "juliet" and then it would subtract those
> containing "ophelia" and then it would combine that whole result with
pages
> containing "pac."

No.  If I understand what you are saying you are misunderstanding the 
word "AND".  You say:

  combine that whole result with pages containing "pac."

which sounds like you mean "OR" as in you expect to get all files with 
"pac" in them, plus the others.  And that's wrong.

A teacher may say "All boys and girls are to go to lunch" but for 
boolean logic that would be nobody.  Nobody is in both sets.

Or think of it this way:  Please pull all books in the library that have 
juliet and not ophelia.  Oh, and they must also have pac in them.

OK, so for:  juliet not ophelia and pac

  1) swish grabs all files with juliet

  2) swish grabs all files without ophelia and then ANDs
     the two lists.  Now we have a list of files that contain
     juliet but not ophelia.  (subtracted out ophelia in your language).

  3) Finally get a list of all files that have pac and AND that list
     with the previous list.  So you end up with files that have both
     pac and juliet but not ophelia.  You can't end up with a set that's
     larger than either of the previous lists.

So the documentation says the same thing:

 retrieves files which contain ``juliet'' and ``pac'' but not ``ophelia''

> The search swish-e -w "juliet not (ophelia and pac)" -f myIndex  
> 
> would retrieve pages that have both "ophelia and pac" in them and then it
> would subtract those pages from those pages with "Juliet" in them.  It
would
> NOT subtract pages with just "ophelia" or just "pac" in them.  So the
> statement "retrieves files with ``juliet'' and containing neither
> ``ophelia'' nor ``pac''" is incorrect.

Yes, I think you are right here.

  1) swish grabs all files with juliet

  2) swish grabs all files with *both* ophelia and pac.
     swish negates that list so you end up with files 
     that contain neither word, or one of the terms, but not both.

  3) combine that with the list of juliet and you get:

  files with juliet but not ones with both pac and ophelia.

Which sounds a lot like 

  "juliet not (ophelia and pac)"

which seems easier to understand.

The documentation says:

retrieves files with ``juliet'' and containing neither ``ophelia'' nor
``pac''

Which sounds wrong to me.

Roy, who wrote those original docs?





-- 
Bill Moseley
moseley@hank.org
Received on Fri Nov 7 21:37:14 2003