Skip to main content.
home | support | download

Back to List Archive

stopwords in queries

From: SRE <eckert(at)not-real.climber.org>
Date: Fri Feb 25 2000 - 08:59:11 GMT
I think there's a problem in search(), inside ifdef IGNORE_STOPWORDS_IN_QUERY

If someone has a test case available, try setting up a search
like "ted and steve" where ted is very common (a stopword) and
steve is not... then look at the results of "swish-e -w ted and steve"

Oops. It finds nothing even if steve is in the index... because
"ted" was removed and the remaining search words are "and steve".
The transcript looks like this:
>>> # Search words: and steve
>>> err: no results

Here's the re-written ifdef/endif section of search.c,
submitted for comments (like did I get it right, or am
I totally off base on what this section was for?)

At least now it finds SOMETHING instead of returning an error.
For now, I'm writing skipped search words as comments... is
this a good idea? At least the calling script COULD pick them
up and let the user know that the search words had been changed.

BTW, it doesn't seem to matter if a rule comes at the END
of the search words, so I left that alone. Bad idea?


#ifdef IGNORE_STOPWORDS_IN_QUERY
		/* Added JM 1/10/98. */
		/* completely re-written 2/25/00 - SRE */
		/* "ted and steve" --> "and steve" if "ted" is stopword --> no matches! */

		/* walk the list, looking for rules & stopwords to splice out */
		/* remove a rule ONLY if it's the first thing on the line */
		/*   (as when exposed by removing stopword that comes before it) */

		/* loop on FIRST word: quit when neither stopword nor rule */
		pointer1 = searchwordlist;
		while (pointer1 != NULL) {
			pointer2 = pointer1->next;
			if(!isstopword(pointer1->line) && !isrule(pointer1->line)) break;
			searchwordlist = pointer2; /* move the head of the list */
			printf("# Removed stopword: %s\n",pointer1->line);
			free(pointer1); /* toss the first point */
			pointer1 = pointer2; /* reset for the loop */
		}
		if (pointer1 == NULL) {
			/* This query contained only stopwords! */
			printf("err: all search words too common to be useful\n.\n");
			exit(0);
		}

		/* loop on REMAINING words: ditch stopwords but keep rules (unless two rules in a row?) */
		pointer2 = pointer1->next;
		while (pointer2 != NULL) {
			if((isstopword(pointer2->line) && !isrule(pointer2->line))    /* non-rule stopwords */
			|| (    isrule(pointer1->line) &&  isrule(pointer2->line))) { /* two rules together */
				printf("# Removed stopword: %s\n",pointer2->line);    /* keep 1st of 2 rule */
				pointer1->next = pointer2->next;
				free(pointer2);
			}
			else {
				pointer1 = pointer1->next;
			}
			pointer2 = pointer2->next;
		}
#endif /* IGNORE_STOPWORDS_IN_QUERY */
Received on Fri Feb 25 04:03:32 2000