Skip to main content.
home | support | download

Back to List Archive

Re: indexing word pairs

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Sep 19 2001 - 14:12:46 GMT
At 03:36 AM 09/19/01 -0700, AusAqua wrote:
>A very brief question for the group.  Just wondering if there is a way to
>instruct swish to index a couple of words together in a phrase.
>
>I am aware that I can index the individual words contained in phrases and
>then use a search form to find documents in which they both occur, or could
>place them in quotes in a search form, to get an exact match.  However, I
>would actually like to get swish to treat words paired in a format such as
>"documents-new" or 'documents-old', etc.  as a unique string that it indexes
>as it appears.

I'm not really clear on what you want to do.  You can add a dash ("-") to
word characters and also add it to IgnoreFirstChar and IgnoreLastChar to
index words with the dash, but then you must search for them with the dash.

   -w 'documents-new'

and that would match the dash.

If you have a list of a few specific words that you want to index that
would not normally be indexed, then you can use the 2.1-dev version and use
the BuzzWords  feature.  That allows you to index words like "C++" when the
plus sign is not a WordCharacter.

>I have tried indexing such words pairs contained in my
>documents in the formats shown below, but have had no success in keeping
>them together; swish seems hellbent on teasing the words appart & indexing
>them separately.
>
>(documents-new)
>"documents-new"
>'documents-new'
>documents.new
>documents_new 

Play with the WordCharacters settings.  But keep in mind if you add, for
example, a dash to WordCharacters then all words with dashes will be
indexed as a single word.  I prefer to use phrase searches.  If you don't
have any of those characters above in WordCharacters you can fine any of
the above by searching:

     -w '"documents new"'

But that will find all of that phrase, which might not be what you want.



Bill Moseley
mailto:moseley@hank.org
Received on Wed Sep 19 14:19:07 2001